timing and synchronization over ethernet797013/fulltext01.pdf · during the thesis. i also want to...

Final thesis

Timing and Synchronization overEthernet

by

Emil Lundqvist

LiTH-ISY-EX--15/4824--SE

Februari 20, 2015

Final thesis

Timing and Synchronization overEthernet

by

Emil Lundqvist


Februari 20, 2015

Supervisor, ISY: Andreas Ehliar

Supervisor, : Victor

Examiner: Olle Seger

Abstract

§ In this thesis an investigation will be done on how time and frequencycan be synchronized over Ethernet with help of Precision Time Protocoland Synchronous Ethernet. The goal is to achieve a high accuracy in thesynchronization when a topology of 10 cascaded nodes is used. Differentapproaches may be used when implementing Precision Time Protocol forsynchronization. They will be investigated and the best approach for agood accuracy will be proposed. Another question that this thesis will coveris how to recover a radio frequency, a multiple of 3.84 MHz from Ethernets10.3125 GHz.

By using hardware support for the timestamps and transparent clocks inthe forwarding nodes the best accuracy is achieved for the time and phasesynchronization. Combining this with Synchronous Ethernet for frequencysynchronization, to get a traceable clock through the system, will lead tothe best result. The total error does not need to be greater than 1.46 nsif the asymmetry in the medium is neglected and a well designed PCS andFIFO are used. Recovering the radio frequency from Ethernet is done byusing the highest common frequency, either an integer phase locked loop ora fractional phase locked loop can be used. The fractional phase locked loopwill give a better result but will contribute with spurs that the integer phaselocked loop does not.

iii

Acknowledgements

First of all I want to thank my supervisor Victor at the company for all theguidelines and help during the thesis. I also want to thank all experts atthe company that my supervisor put me in contact with. Especially Andresand Stefan that was very useful to discussions the problem I encounteredduring the thesis. I also want to thank my boss, Pierre, which gave me theopportunity to do my thesis at the company.

From the University I want to thank my fellow students for the goodtime as a student. I want to give an extra big thanks to Viktor Classon forreviewing my report and act as my opponent for the theses. I also want tothank my examiner Olle Seger and my supervisor Andreas Ehliar for thehelp during this thesis and making it possible to accomplish.

Most of all I want to thank my parents Lars and Maria for supportingme during my time of studies and my partner Ellen Selling that have beenvery supporting during the thesis. At last I also want to thank my friendsand previous neighbor, Alexander and Ruby Peck for reviewing the report.

Emil LundqvistStockholm, February 2015

v

Contents

1 Introduction 11.1 About the work . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.1 Communication protocols . . . . . . . . . . . . . . . . 11.2 Presentation of the problem . . . . . . . . . . . . . . . . . . . 21.3 Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3.1 Topologies . . . . . . . . . . . . . . . . . . . . . . . . . 21.3.2 The system . . . . . . . . . . . . . . . . . . . . . . . . 31.3.3 Time error budget . . . . . . . . . . . . . . . . . . . . 3

1.4 Related research . . . . . . . . . . . . . . . . . . . . . . . . . 41.5 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Background 52.1 Ethernet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.1 Media access control . . . . . . . . . . . . . . . . . . . 62.1.2 Physical coding sublayer . . . . . . . . . . . . . . . . . 7

2.2 Precision time protocol . . . . . . . . . . . . . . . . . . . . . 102.2.1 Synchronization . . . . . . . . . . . . . . . . . . . . . 102.2.2 Different types of clocks . . . . . . . . . . . . . . . . . 11

2.3 Synchronous Ethernet . . . . . . . . . . . . . . . . . . . . . . 132.4 Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.4.1 parts per million . . . . . . . . . . . . . . . . . . . . . 132.4.2 Free running . . . . . . . . . . . . . . . . . . . . . . . 142.4.3 Hop . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.4.4 Topology . . . . . . . . . . . . . . . . . . . . . . . . . 142.4.5 Ingress . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.4.6 Egress . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3 Time Accuracy 153.1 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.1.1 Reference point . . . . . . . . . . . . . . . . . . . . . 153.1.2 Resolution of the timestamp . . . . . . . . . . . . . . . 163.1.3 Asymmetry . . . . . . . . . . . . . . . . . . . . . . . . 163.1.4 Frequency accuracy . . . . . . . . . . . . . . . . . . . 183.1.5 Packet delay variation . . . . . . . . . . . . . . . . . . 19

vii

3.2 Possible solutions . . . . . . . . . . . . . . . . . . . . . . . . . 193.2.1 Reference point . . . . . . . . . . . . . . . . . . . . . . 193.2.2 Resolution of timestamp . . . . . . . . . . . . . . . . . 203.2.3 Asymmetry . . . . . . . . . . . . . . . . . . . . . . . . 203.2.4 Precision time protocol implementation . . . . . . . . 223.2.5 Frequency accuracy . . . . . . . . . . . . . . . . . . . 28

4 Frequency recovery 294.1 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.2 Possible solutions . . . . . . . . . . . . . . . . . . . . . . . . . 29

5 Result 33

6 Conclusion 35

viii

List of Figures

1.1 Shows how a tree topology is estimated with a chain topologyby choosing the longest path . . . . . . . . . . . . . . . . . . 3

1.2 A chain topology with 4 nodes and 3 hops . . . . . . . . . . 3

2.1 An overview of 10GBASE-R Ethernet, an explanation for theabbreviations can be found in Table 2.2 . . . . . . . . . . . . 6

2.2 An Ethernet packet with the following IPG . . . . . . . . . . 72.3 A block diagram over the PCS . . . . . . . . . . . . . . . . . 72.4 FIFO with N positions and 64 bits in each position . . . . . 82.5 An example how a serial scrambler can be implemented, where

the operators are XOR-gates . . . . . . . . . . . . . . . . . . 92.6 An example how a serial descrambler can be implemented,

where the operators are XOR-gates . . . . . . . . . . . . . . . 102.7 A two-step synchronization in PTP . . . . . . . . . . . . . . 112.8 Illustration of a boundary clock . . . . . . . . . . . . . . . . . 122.9 Illustration of a transparent clock . . . . . . . . . . . . . . . . 132.10 Three nodes implemented with Synchronous Ethernet . . . . 13

3.1 A simple system of two nodes . . . . . . . . . . . . . . . . . 153.2 Different choices for the PTPs reference point . . . . . . . . 163.3 Showing the asymmetry in a node . . . . . . . . . . . . . . . 173.4 An example where data1 writes to the FIFO with clock1 and

data2 reads from the FIFO with clock2 . . . . . . . . . . . . . 183.5 The chart shows the variable delay in the FIFO for different

bit widths between PCS and PMA . . . . . . . . . . . . . . . 213.6 Show a network where a boundary clock is useful to reduce

the workload from the grandmaster . . . . . . . . . . . . . . . 233.7 The time estimation in a transparent clock . . . . . . . . . . 243.8 The difference between End-to-End (solid line) and Peer-to-

Peer (dotted line) delay estimation . . . . . . . . . . . . . . . 253.9 Network load for different PTP message . . . . . . . . . . . . 27

4.1 Block diagram over an integer PLL . . . . . . . . . . . . . . 304.2 Phase noise graph for an integer PLL . . . . . . . . . . . . . . 304.3 Block diagram over a fractional PLL . . . . . . . . . . . . . . 31

ix

4.4 Phase noise graph for a fractional PLL . . . . . . . . . . . . . 314.5 Phase noise graph for a fractional PLL with reduced spurs . 32

x

List of Tables

2.1 The different layers in the OSI Model . . . . . . . . . . . . . 52.2 An explanation for the abbreviations used in Figure 2.1 . . . 6

3.1 Show the message size for the PTP message in bytes . . . . . 27

xi

Chapter 1

Introduction

1.1 About the work

This document is a master thesis of a student at Linkopings University. Thework is the last step in the master program Applied Physics and ElectricalEngineering, system on chip. The work will give the student 30hp of the120hp that a master degree contains, this 30hp shall correspond to 20 weeksof studies. The student will be graduated at the Department of ElectricalEngineering in Linkoping and the work will be done at .

The necessary knowledge for this work is to understand the differentcommunication protocols that are used and how the timing will be affected.In this section a short introduction of these protocols will be done.

1.1.1 Communication protocols

Ethernet is one of the most widely used data communication standards inthe world. The standard was published in 1985 at the Institute of Electricaland Electronics Engineers (IEEE) and is defined as IEEE 802.3 [2]. Thecommunication standard is asynchronous, based on data packets and willbe discussed more in Section 2.1. In this thesis the 10GBASE-R standardwill be used.

Precision time protocol (PTP) is a protocol that is designed to synchronizereal time clocks over a network, such as Ethernet. The first published versionwas released 2002 and the second (and latest) version in 2008. This protocolis defined as IEEE 1588 [3]. In Section 2.2 a more detailed overview of theprotocol will be explained.

Synchronous Ethernet (SyncE) is a recommendation from InternationalTelegraph Unions Telecommunication Standardization Sector (ITU-T) onhow a network can be setup to get a good frequency synchronization. InSection 2.3 SyncE will be discussed more.

1

CHAPTER 1. INTRODUCTION

1.2 Presentation of the problem

There are two different tasks which will be investigated in this thesis, thefirst one is regarding time and synchronization and the second one is howto achieve a required frequency. This will be done by using 10GBASE-REthernet as the communication protocol between different nodes.

The first task regarding the time and synchronization is how to geta system with many nodes to have the same perception of time. Thereis a known way to distribute time over Ethernet called PTP, but it canbe implemented in different ways which gives it different properties. Thefocus will be to get a good accuracy combined with a relatively low cost.A recommended solution will be presented and an estimation of the timeaccuracy will be done.

The frequency issue is how a telecommunication frequency can be recoveredfrom Ethernet. The transmitting frequency for Ethernet is 10.3125 GHz,when 10GBASE-R is used, while the wanted radio frequency is a multipleof 3.84 MHz which the 10GBASE-R frequency is not.

1.3 Restrictions

1.3.1 Topologies

The only topology that will be investigated is the chain topology withcascaded nodes. Another topology that could be of interest is the treetopology, but for timing the worst case is the longest path. This can beestimated in a single chain topology with the same number of nodes as thelongest path. An example of that can be seen in Figure 1.1 where the solidline in Figure 1.1a is the longest path and can be represented with the chaintopology in Figure 1.1b.

2


M

S2S1 S3

S4

S7

S5 S6

(a) Tree topology

S1

S4

S7

M

(b) Chain topology

Figure 1.1: Shows how a tree topology is estimated with a chain topologyby choosing the longest path

1.3.2 The system

The system that will be used is one master node with a number of slavenodes that shall be synchronized to the master. The slaves will be in achain topology where each node has a maximum of two logical connections,one upstream and one downstream. The maximum number of nodes is 10. 1master node and 9 slave nodes which gives a maximum of 9 hops. A smallersystem of 4 nodes can be seen in Figure 1.2 as an illustration of the usedexpressions.

Master Slave Slave

hopnode

Slave

Figure 1.2: A chain topology with 4 nodes and 3 hops

1.3.3 Time error budget

A network is commonly built with different components from differentmanufacturers and the system have certain requirements of the finalnetwork that shall be fulfilled. Therefore it is beneficial to divide the

3


requirements between different parts of the final network and give eachpart their own requirement or budget to follow. This thesis contains asystem with cascaded nodes and the focus will be on the nodes and not onthe link between them. The time error from the cable is in another timebudget that is not covered to the same extent in this thesis. Therefore itwill be mentioned but not investigated in the same way as the nodes.

1.4 Related research

There is a lot of studies handling timing and synchronization over a networkor any of the protocols that are used in this thesis. All the articles are notrelevant for this thesis but some of the articles that are more related to thesubject are mentioned below.

For example there is one study handling how the differentimplementation methods will behave in a highly cascaded network [14].Another study describe how a transparent clock can be implemented [13].In this study there is also a discussion about different sources of the timeerror.

There is also an article that describes how the boundary clocks can beused in telecom networks. In the article there is a discussion of how SyncEcan be used together with PTP [15].

Beside the articles, there is also a book that can be used for some basicunderstanding. This book was unfortunately written before the secondversion of the PTP was released [12]. The book handle many of the featuresthat was released in the second version of PTP but does not go into all thedetails.

1.5 Outline

An introduction of the thesis and its restriction is given in Chapter 1. InChapter 2 the background with the different protocols are explained. Someexpression that will be used during the thesis are also explained there.Chapter 3 will handle the time problem, where the first section will explainthe problem while the second section will discuss different solution. InChapter 4 the different frequencies will be discussed, regarding whatthe problem is and what a possible solution can look like. Chapter 5will go through the result of the thesis and summarize how timing andsynchronization can be transfered over Ethernet with the best solution.A solution that contain the best possible accuracy. The last chapter,Chapter 6, will give some conclusion and a comparison with measurementsfrom another system.

4

Chapter 2

Background

2.1 Ethernet

Ethernet is a family of standards for communication over a physical mediain computer networks. It is the most common technology in local areanetworks (LAN) and the working group IEEE 802.3 have released manystandards since the first one which was released in 1982.

In this thesis IEEE 802.3-2012 will be used, this standard describesthe physical layer (PHY) [9] and the data link layer media access control(MAC) [8]. According to the open systems interconnection model (OSImodel) [6] these will be found in layer 1 and layer 2. In Table 2.1 all thedifferent layers in the OSI model can be found.

Layer number Name7 Application Layer6 Presentation Layer5 Session Layer4 Transport Layer3 Network Layer2 Data Link Layer1 Physical Layer

Table 2.1: The different layers in the OSI Model

It is only the 10GBASE-R that are of interest in this documentation,Figure 2.1 gives an overview of the system. The most important block is thephysical coding sublayer (PCS), this block will be described in more detailin Section 2.1.2.

5

CHAPTER 2. BACKGROUND

MAC

Reconciliation

PCS

Serial PMA

PMD

Medium

MDI

XGMII

Physical layer

Data link layer

Figure 2.1: An overview of 10GBASE-R Ethernet, an explanation for theabbreviations can be found in Table 2.2

Abbreviation ExplanationMAC Media Access ControlXGMII 10 Gigabit Media Independent InterfacePCS Physical Control SublayerPMA Physical Media AttachmentPMD Physical Media DependentMDI Medium Dependent Interface

Table 2.2: An explanation for the abbreviations used in Figure 2.1

2.1.1 Media access control

The MAC is described in [8]. The packets that are transmitted from thetransmitter and received in the receiver are formatted as shown in Figure 2.2.An Ethernet packet consists of a preamble of 7 bytes, a start frame delimiter(SFD) of 1 byte, a MAC destination adress of 6 bytes, a MAC source addressof 6 bytes, a length of 2 bytes, a payload of 46-1500 bytes and a frame checksequence (FCS) of 4 bytes. The MAC destination, the MAC source and thelength are often called the header of the Ethernet packet. According to theprotocol an interpacket gap (IPG) need to be sent after each packet, thishas a standard minimum of 12 bytes. In total a new packet can be sent84 to 1538 bytes after the beginning of the previous packet was sent, withthe assumption that the minimum number of IPG is used. When a packet

6


is ready to be transmitted, the packet travels with the 10 Gigabit MediaIndependent Interface (XGMII) to the PCS. In this thesis the XGMII willbe 64-bits wide which gives the transfer frequency of 156.25 MHz at everypin.

Preamble7 bytes

SFD1 byte

MAC destination6 bytes

MAC source6 bytes

Length2 bytes

Payload46-1500 bytes

FCS4 bytes

IPG≥12 bytes

Figure 2.2: An Ethernet packet with the following IPG

2.1.2 Physical coding sublayer

The PCS [9] consists of one transmitting part and one receiving part. Bothparts are shown in Figure 2.3. One of the most important parts of this blockis that the input and output have different bit widths and frequencies. It isup to the PCS to handle this.

In the PCS the 64-bits or 66-bits are called a block. The differencebetween them is that in the encoder in the transmitting process add a syncheader of 2 bits to the block. One block will always contains 8 bytes of data.

FIFO Encoder Scrambler Gearbox 64 64 66 66 W

Transmitter

FIFO Decoder Descrambler Block Sync

64 64 66 66 W

Receiver

Physical Coding Sublayer

XGMI I

PMA

Figure 2.3: A block diagram over the PCS

In this thesis the connection to XGMII has a width of 64 bits and afrequency of 156.25 MHz which gives the correct bandwidth of 10 Gb/s.The connection to the physical medium attachment (PMA) sublayer shallvary in number of bits and will be investigated how the accuracy depends onthe bit width. The frequency of 10GBASE-R line is 10312.5 MHz because ofthe 64b/66b transmission code, by choosing the bit width of W the internalfrequency (between the First In First Out (FIFO) and the PMA) in the PCSis given by Equation (2.1).

7


fPCS =10312.5

WMHz (2.1)

First In First Out

The FIFO is a buffer which purpose is to make it possible to use two differentclocks in the same unit. One of the clocks is used to write to the buffer whilethe other reads from the buffer. A FIFO can be implemented with a ringbuffer which has one pointer for writing and one for reading. In Figure 2.4an example of a FIFO with N positions and 64 bits in each position isshown. The difference between the write pointer (wr ptr) and the readpointer (rd ptr) is the number of occupied positions and it is called theoffset.

1

2

krd ptr

FIFOAddress 64 bits

k+1

k+2

k+3

N

wr ptr

wr data

rd data

rd clkwr cl

Figure 2.4: FIFO with N positions and 64 bits in each position

Encoder

The PCS is using a 64b/66b transmission code where two bits are added tothe block. These bits, called sync header, are either ’01’ for a data blockor ’10’ for a control block. A data block contains 8 bytes of data while thecontrol block can contain both data and control information. If there is acontrol block, the first byte will indicate how the rest of the bits in the blockshall be read. A table over the different blocks can be seen in [9] Figure 49-7.

Because of the sync header will only be ’01’ or ’10’ and never ’11’ or ’00’,there will always be a transition every 66th bits.

8


Scrambler

The scramble is used to give the signal a more random characteristic, whichwill reduce the long chain of 0 or 1. The scrambler is a self-synchronizingscrambler and will use the polynomial given in Equation (2.2). In Figure 2.5a serial implementation of the scrambler with the given polynomial can beseen. A parallel implementation will be used but that is harder to visualize.The sync header will bypass the scramble since it is used in the blocksynchronization discussed below.

1 2 37 38 39 57 58

Data in

Scrambled data out

Figure 2.5: An example how a serial scrambler can be implemented, wherethe operators are XOR-gates

G(x) = x58 + x39 + 1 (2.2)

Gearbox

The gearbox’s purpose is to change the size of the block. An incoming blocksize is 66-bits while the outgoing block size is W where W ≤ 66. There is thesame frequency on both sides of the gearbox and the same amount of datashall be transmitted on both sides. Because of that, some of the incomingdata is invalid. For example if W = 40 bits there will be 66−40

66 = 1333 invalid

bits, in 33 clock cycles there will be 26 valid and 13 invalid 66-bit blocks asan input and 33 40-bit blocks as an output where all the output blocks arevalid.

Block synchronization

The block synchronization uses the sync header to synchronize and output66-bit block. It utilizes that there will always be a transition every 66 bitsindependent on what type of data that is transferred. It can also be used aspart of an error detector to alert an error if the sync header would be ’11’or ’00’.

9


Descrambler

The purpose of the descrambler is to remove the effect from the scramblerin the transmitting process and is done by using the descrambler accordingto Figure 2.6. The same polynomial that is used for the scramble,Equation (2.2) is used for the descrambler as well. The sync header willbypass the descrambler in the same way as it did in the scrambling process.

1 2 37 38 39 57 58

Scrambled data in

Data out

Figure 2.6: An example how a serial descrambler can be implemented,where the operators are XOR-gates

Decoder

The decoder will decode the data that was encoded in the transmittingprocess. The 64b/66b decoding will remove the sync header and the outputwill be 64 bits, 8 bytes of data, which is the same as the input to the encoderin the transmitter.

2.2 Precision time protocol

In this document IEEE 1588-2008 [7] will be referred to as PTP and messagesthat are sent with this protocol will be referred to as PTP messages. Thisprotocol was developed in order to make it possible to synchronize time andphase over Ethernet. A first version was released in 2002 and the latestrevision was released the 2008.

The protocol is built in a master and slave hierarchy where the slavesynchronizes the time to its master. The synchronizations are done withPTP messages that contain the time of day (ToD) that are sent from themaster to the slave.

2.2.1 Synchronization

In the protocol there are two kinds of PTP messages. There are ordinarymessages and there are event messages. The difference between these arethat the event messages need to be timestamped at ingress and egress whilethe ordinary messages do not have any use of timestamps.

10


The timestamp for an event message is taken when the message passesthe reference point at the ingress and egress of the node. This referencepoint can be determined either with software or with help of hardware. Ifhardware support is chosen for the timestamps it can still be necessary tohave some software to handle the synchronization process.

The synchronization can be done either by one-step or two-step. Whenusing one-step synchronization the egress time will be embedded in themessage that caused the timestamp. In two-step the egress time will besent in a follow up message instead of embedding it in the event message.

t1

t2

t3

t4

Sync

Follow Up

Delay Req

Delay Resp

Master Slave Timestamps known by slave

t2

t1, t2

t1, t2, t3

t1, t2, t3, t4

Figure 2.7: A two-step synchronization in PTP

In Figure 2.7 the synchronization is made with the two-stepsynchronization and in Equation (2.3) the calculations of the delay andoffset are presented. The delay is the time it takes for the message totravel from the master to the slave (or slave to the master), the protocolassumes a symmetric delay time. The offset is the difference in timebetween the slave and the master after compensating for the delay.

tdelay =(t2 − t1) + (t4 − t3)

2(2.3a)

toffset = t2 − t1 − tdelay (2.3b)

2.2.2 Different types of clocks

An ordinary clock (OC) can serve as both a slave or a master. When writingabout an OC later on in this thesis it will be referred to as a slave if nothingelse is mentioned.

11


The grandmaster (GM) clock is the master clock and contain the actualtime that the rest of the system will synchronize their clocks to. Therecan be several clocks in the system that are claiming the rights of the GMposition but only one clock at a time can be the GM. To decide which clockthat is the most suitable to take the role as GM, all clocks that can be amaster send out an announce message. Each node then runs an algorithmcalled the best master clock algorithm (BMCA). That algorithm is made soall nodes will make the same decision of which clock is the best master clockand will be GM. If there would be several clocks with the same performancethe last selection in the algorithm is the clock identity which serves like atie-breaker, each clock has a unique identity.

A boundary clock (BC), which can be seen in Figure 2.8, is a clock thatserves as a slave on one of its ports and synchronizes its local clock to themaster. The BC then acts like a master with its local clock as the referencetime to the rest of the system. This clock is useful in a switch or router ina bigger network to reduce the workload from the GM and reduce the timeerror through the switch or router.

BC

slave

master

master

Figure 2.8: Illustration of a boundary clock

The transparent clock (TC), which can be seen in Figure 2.9, is anotherway to reduce the time error through a switch or router. Compared to theBC this does not have a local clock that needs to be synchronized to itsmaster. Instead, the TC calculates the time a packet spends in the switchor router and compensates for it. A TC can be combined with an OC thatsynchronizes to the GM to support a network element and in that case notonly serve as a switch or router.

12


TC

Figure 2.9: Illustration of a transparent clock

2.3 Synchronous Ethernet

SyncE is a recommendation from ITU-T on how to deliver a frequency ina network, [4] and [5] describe the recommendations. According to therecommendation the frequency will be recovered from the bit stream in thephysical layer. The clock that will be distributed in the chain is calledthe primary reference clock (PRC) and all clocks in the network shall betraceable to that clock. To get a traceable clock all nodes in a chain betweenthe master and the end device need to be implemented with a synchronousEthernet equipment clock (EEC) according to the SyncE recommendations.The performance of the recovered clock will not depend on the networkload since it does not synchronize with any specific packet [16]. Figure 2.10presents a small network that are using syncE.

Higher layers

PHY PHY

EEC

PHY PHY

EECPRC

Higher layersHigher layers

PHY PHY

Master Slave Slave

Sync Sync

Figure 2.10: Three nodes implemented with Synchronous Ethernet

2.4 Expressions

2.4.1 parts per million

Parts per million (ppm) is a measurement on how accurate a clock is. It is ascale on how inaccurate the clock is allowed to be according to specification

13


or another clock. For example, a clock with the frequency of 250 MHz andthe accuracy of 100 ppm will have a frequency of 250±250∗ 100

106 MHz whichgives a minimum frequency of 249.975 MHz and a maximum frequency of250.025 MHz.

2.4.2 Free running

A free running clock is a clock that is not synchronized to any other clockor system. This means that two similar clocks that are in free run mode canhave slightly different frequencies and most likely different phases.

2.4.3 Hop

A hop in a computer network is one part of the network, when a packet ispasses through a forwarding node, for example a router. The total numberof hops between a slave and a master is the number of nodes a packet need togo through before it reach its final destination. Figure 1.2 show the conceptof how this expression is used.

2.4.4 Topology

In this thesis a topology will be refereed as a network topology that describeshow the nodes are arrange in the network and which nodes that are linkedto each other. The topology can show how the data packets are sent fromone node to another and what data path the packet can take to reach itsfinal destination.

2.4.5 Ingress

The ingress is the input path in the node when data is received.

2.4.6 Egress

The egress is the output path in the node when data is transmitted.

14

Chapter 3

Time Accuracy

3.1 Problems

In Section 3.1.1-3.1.4 the system will only contain two nodes as Figure 3.1displays. The left node is acting as a master and the right node serves as aslave. With this smaller system it is easier to describe what timing problemsthat occurs between two nodes. In Section 3.1.5 a bigger system will be usedto describe what problems that occurs when a cascaded system is used andthe time information needs to be forwarded through one or several nodes.

Master Slavetsm

tms

Figure 3.1: A simple system of two nodes

3.1.1 Reference point

When looking at the timing between two nodes the timestamp referencepoint has a large impact on the accuracy. The reference point can either bein software like an application or as an interrupt. It can also use hardwaresupport to determine the timestamps. In Figure 3.2 the different methodscan be seen. If the reference point is placed in software it is hard to knowthe path and how long the latency is from the communication medium tothe reference point, which is symbolized with the cloud. In most of the casesa PTP software block is necessary even if the timestamp will be determinedwith the hardware support.

15

CHAPTER 3. TIME ACCURACY

PTP softwareblock

PHY

Communication medium

Hardwarereference point

Softwarereference point

Figure 3.2: Different choices for the PTPs reference point

By choosing a software solution it will be cheap and easy to implementsince no specific hardware is needed, but it will be problematic to estimatethe delay time between when the message arrives until the software readsthe message and take the timestamp. In a software solution the timestampcan be taken in an application or in the best case it can use an interrupt.

A hardware solution is closer to the communication medium and is easierto estimate the delay time between the physical medium and when thetimestamp will be taken. It needs some hardware assistance, not only totake the timestamp but also to take a fingerprint. The fingerprint is usedas an ID, to match the timestamp with its PTP packet in the PTP softwareblock.

Another problem with the hardware solution is that the timestamp willbe taken when the SFD byte in the Ethernet packet passes the referencepoint. At this moment the system is unaware whether the message is anevent message or not. That information will arrive later in the PTP headerwhich is located first in Ethernets payload according to Figure 2.2.

3.1.2 Resolution of the timestamp

The ToD is stored like a counter that will update the time each clock cycle.This means that the time will have the same resolution as the period timeof the clock.

3.1.3 Asymmetry

The PTP assume that the delay time from the master to the slave (tms) isthe same as the delay time from the slave to the master (tsm). If this isthe case the delay time between two nodes is called symmetrical. Since theprotocol assumes that the delay time is symmetrical, any asymmetry will

16


contribute with a time error in the range of tms+tsm2 . The asymmetry can

be divided into the communication medium and to the node.The asymmetry in the communication medium depends on what medium

that is used. If optical fiber is used the asymmetry can occur from differentlasers that are used. This asymmetry is very individual for each cable andthe delay may even vary with the temperature of the cables environment.

The PTP assume that the timestamp is measured at the timestamppoint but that is not possible since the message is scrambled at that point.The timestamp is instead measured at the reference point and there will belatency between the reference point and the timestamp point. In Figure 3.3the latency is called transmitting latency or receiving latency depending onif the message is transmitted or received. When the latency differs fromeach other it will instead contribute to an asymmetry error. If the latencywould be the same for both the transmitting and receiving part the latencyerror could be eliminated.

Transmiting

block

Receiving

blockreference point

communication medium

timestamp point

transmitting latency

reference point

receiving latency

Figure 3.3: Showing the asymmetry in a node

If the asymmetry is known there is a way to compensate for it, but if itvaries then it is much harder to correct it. In that case it is only possible topartly reduce the error. In the PCS there is a variable delay that occurs inthe FIFO and in the gearbox.

Variable delay in the FIFO buffer

The input and output of the FIFO uses two different frequencies but thesame bit width, which means that some data on the side with a fasterfrequency is invalid. As assumed previously in this thesis one side of theFIFO has a bit width of 64 bits and a frequency of 156.25 MHz. The otherside have the same bit width of 64 bits but a frequency that depends onthe bit width between the PCS and the PMA. This relation can be found inEquation (2.1). Because of the different frequencies and the invalid data the

17


delay in the FIFO will be different for each block of data. This is a variabledelay that occurs due to the different frequencies.

In Figure 3.4 an example is used where the bit width between PCS andPMA is 40. With the selected bit width of 40, the internal frequency willbe 258.8125 MHz. The time a data block spends in the FIFO is markedred and have a value of 64 in ”bits in FIFO”. The figure just represent thedifferent of bits in the FIFO and not the absolute value. Where it shows 0bits it represents the lower value that will be stored in the FIFO and notnecessary 0. A preferred solution would be to store an integer of a periodin the FIFO. In this case, with a bit width of 40, a period would be 128 ns.That correspond to 20 cycles of Clk1 and an integer number for the FIFOwould be N*1280 bytes, where N is an integer. The grey area in the figurerepresent the invalid data block that is transported internally in the PCS.

Clk1 156MHz

Clk2 258MHz

Data1 in to FIFO

Bits in FIFO 0 64 0 64 0 64 0 64 0 64 0 64 0

Data2 out of FIFO

Figure 3.4: An example where data1 writes to the FIFO with clock1 anddata2 reads from the FIFO with clock2

Variable delay in the gearbox

The gearbox will have an input of 66-bit block and an output bit widththat is the same as the bit width between the PCS and PMA. The gearboxwill therefore have a buffer that will be differently filled at different times.The time from the input of the gearbox to the communication mediumwill depend on how full the buffer is. The buffer has the same clock forreading and writing, by reading out how full the buffer is this delay can becompensated for.

3.1.4 Frequency accuracy

The accuracy of the frequency can also contribute to a time error and needstherefore to be mentioned. If a system have a requirement on the timeaccuracy a high accurate local clock is probably used. In such a systemwith a high accurate local clock there might not be any problem with thefrequency accuracy. Since there is no specified requirement on the localclock in this thesis it is necessary to take this error in to account as well.

The frequency error occurs when the frequency in the master clock isdifferent from the frequency in the slave. In this thesis an assumption ismade that the master contains a very accurate clock, for example with aGPS signal as a source and an accurate oscillator. A source like that can

18


give an accuracy better than 10−13[16]. If the slave node for example usea local clock with an accuracy of ±100 ppm (which is the requirement forEthernet). Then the difference in time between the master and the slavecan be 100 µs after one second.

3.1.5 Packet delay variation

The previously mentioned problems occurs in a single node. Therefore allthese problems will contribute with a time error in a cascaded system. Butin a cascaded system there is also necessary to consider the time a packetspend inside the node. In a network with nodes that shall forward packets(switches and routers), the packet will be received, placed in a queue andthen be transmitted again. The time it spends inside a node depends in partof the queue, the time difference called packet delay variation. With helpof PTP there are two methods to implement a forwarding node to handlethis delay, they will be further discussed in Section 3.2.4. PTP can alsobe implemented without handling this delay at all but the accuracy of thesynchronization and the delay measurement will then decrease.

3.2 Possible solutions

3.2.1 Reference point

To get a good accuracy the timestamp reference point needs to be takenwith hardware support. Since the timestamp is taken when the SFD passesthe reference point all messages need to be timestamped. After the PTPheader is read a decision can be made if the timestamp shall be transmittedto the PTP software block or if it shall be discarded.

The best place to put the hardware is to do it after the PCS, becausebefore the PCS the messages are still scrambled and no information can beread from the messages. It is good to have the reference point close to thecommunication medium to get the lowest variable delay and the best possibletime accuracy. Therefore the best point to take the timestamp is in theXGMII while information between the PCS and the MAC is transmitting.

The fingerprint is used for bringing the event message together with thetimestamp in the software block. It can be solved with the sequenceId,message type and the source address. The sequenceId is a number thatincrease for each transmission of a specific message. By saving a fingerprintwith the timestamp and compare it with incoming event message in thesoftware block the correct timestamp shall be concatenated with the correctmessage.

19


3.2.2 Resolution of timestamp

The choice of clock that update the ToD for the timestamp get a resolutionof the period time. ToD will accumulate with the period time every clockcycle to keep track of the time. An oscillator that can be used for thispurpose is a free running 125 MHz clock which give an resolution of 8 ns[13]. It is also possible to use faster oscillators such as a 250 MHz whichgives a resolution of 4 ns.

Instead of using a free running clock for the ToD, a recovered clock canbe used. This will have the same problem with the resolution but insteadof a period time of 8 or 4 ns the period time will be a fractional value.By choosing a bit width of 16 bits between the PCS and PMA, the clockfrequency will be 644.53 MHz and the resolution of the accumulated timewill be 1.6 ns.

3.2.3 Asymmetry

The asymmetry was divided in asymmetry inside the node and asymmetrybetween two nodes. According to Section 1.3.3 the only asymmetry that isimportant is the one that occurs inside the node. But when PTP is used thedelay will be measured with help of the timestamps, the time between tworeference points in two different nodes. This leads to that the asymmetry inthe medium will contribute to a time error in the PTP delay measurement.With that in mind it might be necessary to move the time budget from thecommunication medium to the node or a delay measurement budget.

By knowing the fixed latency between the timestamp and thecommunication medium it can be compensated for by using Equation (3.1),T stands for transmit and R for receive. But it is only the fixed delay thatcan be totally removed by this equation. A variable delay can only bereduce by estimate the average delay and in the best way halving themaximum error.

T Timestamp = T MeasuredTimestamp + T Latency (3.1a)

R Timestamp = R MeasuredTimestamp− R Latency (3.1b)

In Figure 3.4 there was an example about the variable delay in the FIFO.When using hardware support for the timestamp and it is located at theXGMII as discussed previously, the delay from the PCS is the only variabledelay before the timestamp reference point. The delay depended on thefrequency which in turn depends on the bit with to the PMA. By studyingdifferent bit widths and calculate the variable delay it can bee seen thatwith a higher frequency (smaller bit width) the variable delay will decrease.Figure 3.5 show the theoretical variable delay though an ideal FIFO fordifferent bit widths. The variable delay is the difference between the shortest

20


and the longest time a data block is stored in the FIFO. To get this resultthe data is periodically transferred through the PCS in a cycle of 33 clockcycles. The valid blocks is spread out over the whole period to get as equaltime in the FIFO as possible. The FIFO is presumed to have data for atleast one period in the beginning.

0,00

1,00

2,00

3,00

4,00

5,00

6,00

7,00

64 40 32 20 16 10 8

ns

Bit width

Variable delay in FIFO

Figure 3.5: The chart shows the variable delay in the FIFO for differentbit widths between PCS and PMA

The variable delay in the gearbox will also depend on the bit width.Because of the same clock is being used for both input and output in thegearbox the variable delay will be cyclist. The cycle will always contain 33clock cycles because of the selected different values of the bit width. TheSFD that shall be timestamped will always be at the same position in anincoming 66-bit block. The only part that can differ is how full the gearboxis when the block arrives. Since the delay depends on how full the bufferis and this will be repeated in 33 cycles. It is possible to compensate for itwith a variable value depending on number of bits in the buffer.

According to the Ethernet standard [9], 16 bit is the original bit widthbetween the PCS and PMA. This gives a variable delay of 1.36 ns from theFIFO. If it is compensated with help of Equation 3.1 the contribution to thetime error will be 0.68 ns. This value is for a well designed FIFO with highlyaccurate clocks for reading and writing. If the clock for either the writingor reading is of a lower accurate clock, they can start to drift against eachother and this will contribute to a bigger time error.

21


3.2.4 Precision time protocol implementation

The implementation of PTP can vary depending on how accurate theprotocol needs to be and how much it is allowed to cost. In this documentwe have already assumed that the timestamp point will be in the hardwarebetween the PHY and the MAC. Another thing that can be assumed fromthe system is that it is only one node that is consider to be the masternode. Because of that, the announce message, which is sent out by allmaster nodes so that the GM can be decided, is not that important. TheBMAC that all nodes use to decide which is the best master node is noteither of any great interest. Therefore those will only be mentioned andnot handled as much as the rest of the implementation choices.

Boundary clock vs Transparent clock

If a node is not only a slave but also forwards timing information furtherdown in the system it needs some more functionality. To make it accurateit can be designed with either BCs or with TCs, but it can also be acombination of both TCs and BCs. First there will be a comparison betweenthem in the selected system and then there will be a short discussion onwhich solution that is preferred for the given system.

When the system is using BCs each node will synchronize its local clockto its master. In a cascaded system each node is depending on the previousclock. Each node will have a control loop and by cascading this loop jitterand time error will accumulate though the system. Therefor a high cascadednetwork with BC can cause accuracy problems. The error can be reducedby using high quality oscillators. That will be an expensive solution and willnot remove the problem with the cascaded control loop. A better solutionis to use the TC that is more suitable for the cascaded network because itdo not contain any control loop [10].

The BC is more suitable in systems with a tree topology where one inputlead to several outputs that shall bee fed. The BC will then move workloadfrom the GM to the BC because sync and delay request messages do notpass through the BC. The BC synchronize to the master as an OC and thenacts like a master and sends out separate sync and delay request message tothe rest of the system. In that way the master does not need to know whatthe system looks like after the BC.

22


GM

OCGM

OCGMBCGM

OCBC

OCBC

OCBC

S

M

M

M

Figure 3.6: Show a network where a boundary clock is useful to reducethe workload from the grandmaster

Figure 3.6 shows how a BC can be used. All the end devices have anOC that is synchronized to its master. The OCGM and the BCGM aresynchronized to the GM and OCBC are synchronized to the BC. The GMonly synchronize to 3 nodes instead of 5 that would be necessary withoutthe BC.

The TC does not need to synchronize to its master, instead it keeps trackof how long time the packets spend inside the node and then put this time(correction time) in the correction field in the PTP message. If two stepsynchronization is used the TC include the correction time in the follow upmessage and delay response message. If one step synchronization is used thecorrection time is added directly in the sync message and the delay requestmessage. In Figure 3.7 it is shown how the TC is used to get the correctiontime to the correction field.

23


IngressResidence time bridge

Egress

Local time

Ingress timestamp Egress timestamp

Correction time

Figure 3.7: The time estimation in a transparent clock

When the local clock does not need to synchronize to the master it willnot have the same problem as the BC with the cascaded control loops. Thenodes will not depend on each other like they did with BCs, the only timeerror will be if the clock is free running and estimate the time incorrect. Forexample if an oscillator with an error of 100 ppm is used and the latencythrough a TC is 1 µs, the maximum time error for the TC will be 100

106 ∗ 1 ∗10−6 = 100 ps.

If an network element shall be used in the node with a TC it needs tohave an OC attach to it. This OC will keep track of the ToD which anordinary TC does not do, but it will not affect nodes further down in thesystem.

In most of the cases the TC is preferred, especially in a high cascadedsystem. In a big tree topology it can be useful to have some BC to reducethe workload from the GM.

In [14] they have studied the accuracy between the BC v1 and the TC inhighly cascaded network where they used up to 30 nodes. The result showthat the TC get a much lower maximum jitter than the BC. By looking at10 nodes which is of interest for this documentation the TC implementationis about three time better than the BC implementation.

One-step vs Two-step

Two-step synchronization is shown in Figure 2.7 where the timestamp fromthe sync message is send in a follow up message and not embedded. If one-step synchronization shall be used instead of two-step it need to embed thetimestamp in the sync message, therefore the follow up message is removed.If the timestamps are generated at a hardware level the timestamps need tobe embedded in hardware as well. Therefore the timestamps will be takenbefore the reference point when transmitting and an estimation will be doneon how long time it will take to travel from the timestamp point to the

24


reference point. The FCS in the Ethernet frame must also be recalculatedin all the forwarding nodes when one-step synchronization is used. Thereforeit is easier to use two-step synchronization when the timestamp referencepoint is in the hardware level.

The same reasoning can be used with the Pdelay response follow upmessage if Peer-to-Peer (P2P) is used.

End-to-End vs Peer-to-Peer

Projected that the TC is selected it can either be implemented with End-to-End (E2E) or P2P. They are both using the same technique to sendsynchronization messages but differ in how to deal with the delay estimation.In Figure 3.8 the different messages for each type of delay estimation areshown.

GM TC TC OC

Delay Req

Delay Resp

Pdelay Req

Pdelay Resp

Pdelay Resp Follow Up

Figure 3.8: The difference between End-to-End (solid line) and Peer-to-Peer (dotted line) delay estimation

When E2E is used the master clock gets a delay request message fromeach slave and sends out delay response message as an answer for eachrequest. The slave then calculates the delay path between the master anditself. With this implementation it is not necessary to have any specific PTProuters or switches in the system, it is enough to have PTP implemented atthe end devices that will send and receive the synchronization. But if thereis a router or switches that do not have PTP implemented the accuracy willbe heavily reduced, especially in highly loaded networks.

If P2P is used instead of E2E each node only communicate with itsneighbor to calculate the delay path. Each node sends a Pdelay requestmessage to the previous node, which answers with a Pdelay response (anda Pdelay response follow up message if two-step synchronization is used).In this way the receiving node knows the delay from its neighbor and cancompensate for it when receiving a synchronization message.

25


With P2P there will only be 3 (2 if one-step is used) messages betweeneach node to do one delay calculation. In E2E the number between twonodes will increase with the total number of nodes. For E2E the averagenumber of PTP delay messages between two nodes will be the same as thenumber of nodes in the cascaded system. The workload will also be movedfrom the GM to the slaves when P2P is selected. Therefore P2P can be toprefer in a bigger system.

Exemplifying this with a cascaded system with 10 nodes. The totalnumber of delay messages to update the delay path would be 90 for E2Ewhile it would only be 27 for P2P. It is even lower if one-step synchronizationis used.

When two-step synchronization is used there is another advantage byusing P2P instead of E2E. When a delay request message passes by a nodea timestamp will be taken at ingress and egress. The difference will beadded in the correction field, in the delay response message. Therefore anode needs to store the correction time until the delay response messagepasses the same node on the way back. In a chain topology a node mayneed to store several different correction times for several different nodesfurther down in the chain. This requires extra memory and extra logic tohandle the correction time for an E2E solution.

Message interval

There are 3 messages that need to have a message interval defined. Theseare sync, delay request (or Pdelay request) and announce message. Followup will be sent after a sync message and will therefore have the same intervalas the sync message. Delay response will be an answer to the delay request.If P2P is used the Pdelay response follow up message will serve as a followup message to the Pdelay response and have the same interval as Pdelayrequest and Pdelay response.

The message interval is set with a 8-bit two complement number whichis the logarithm with the base of two. If the sync message interval is setto -4 there will be 2−4 s between each sync message, which is the same as16 sync message each second. The protocol also defines this time to be theaverage of the message rate and each message will with 90% confidence notdiffer with more than 30% from the average. This is necessary informationwhen discussing the worst case scenario but when estimating the accuracythe average can be used and will be used further in this document.

The announce message is used to decide which master clock that willserve as the GM. Since the system in this document only has one GM theannounce message cannot be used to change master, it can only indicate ifthe network is broken or not. Therefore it is not that important to have ahigh message rate for the announce messages.

Sync messages are broadcasted from the master to give the slaves thecorrect ToD. The message rate can compensate an inaccurate local clockin the slave, with a cost of more network traffic. If the slave have a local

26


PTP clock with an accuracy of 100 ppm according to the master. The slavecan have an error of 100 µs after a second. But if the sync message rate is32 message/s the error will be reduced with a factor of 32. This results inan error of maximum 3.125 µs independent of how long time the system isrunning.

The delay request and the delay response is used to measure the delaytime between two nodes. The contribution of the asymmetry will end up inthese delay measurements. Even if the communication medium is handledin a separate budget it will be hard to separate where the asymmetryoccurs with help of these measurements. One advantage is that the delaymeasurement is done continuously and if the delay changes (for examplewith the temperature) the measurement will notice the change.

Message PTP messagesize

Ethernet packetsize

Announce 64 102Sync 44 84Follow Up 44 84Delay Req 44 84Delay Resp 54 92Pdelay Req 54 92Pdelay Resp 54 92Pdelay Resp Follow Up 54 92

Table 3.1: Show the message size for the PTP message in bytes

In Table 3.1 the different message size can be seen for each message. ThePTP message that only contain 44 bytes need 2 extra padding bytes to reachEthernets requirement of the minimum payload of 46 bytes. In Ethernetspacket size the preamble and 12 bytes IPG are included.

0,00000%

0,00200%

0,00400%

0,00600%

0,00800%

0,01000%

0,01200%

0,01400%

0,01600%

0,01800%

128 64 32 16 8

Occupation of network

Message/s

Network load

Sync and follow up

E2E Delay average

E2E Delay first hop

P2P Delay

Figure 3.9: Network load for different PTP message

27


In Figure 3.9 the network load for different message can be seen for anetwork with 9 node. As discussed in the section End-to-End vs Peer-to-Peer the network load is different between two nodes depending on wherein the chain the nodes are located when E2E is used. The worst case isbetween the master node and the first slave node. Even though the packetsare slightly bigger when P2P is used instead of E2E the total network loadwill be much smaller with P2P.

3.2.5 Frequency accuracy

The ToD is a register that contains the time and will be updated every clockcycle. If the frequency at a slave differs from the frequency at the master thiswill lead to an error that was discussed in Section 3.1.4. A solution wouldbe to change the accumulated value at the slave, instead of accumulate with4 ns every clock cycle 3.996 ns can be used. This would compensate for aslave clock that is 100 ppm faster than the master clock. This will increasethe accuracy of the frequency but there will still be errors if the slave clockstart to drift.

Another, more accurate, way to solve the problem with the frequencyaccuracy is to use SyncE. With help of SyncE the slave clock will synchronizeto the master clock in a physical level and will have the same stability asthe master clock through the whole system. There is some drawback byusing SyncE, the biggest one is that a more accurate local clock is necessaryat the slave nodes. Instead of a clock with an accuracy of 100 ppm as therequirement is for Ethernet, the local clock need to be within 4.6 ppm. Therecan be at most 10 clocks of this type cascaded before an even better clockis necessary. SyncE achieve a long term frequency accuracy of 10 parts pertrillion [11]. This corresponds to 10 ps time error in one second.

28

Chapter 4

Frequency recovery

4.1 Problems

The frequency problem deals with recovering of the radio frequency fromEthernet. As mentioned before the transmit frequency of Ethernet is10.3125 GHz. After recovering of the frequency and division with 16 thefrequency is 644.53125 MHz. With help of that a radio frequency will berecovered which is a multiple of 3.84 MHz. To be able to recover a radiofrequency a phase lock loop (PLL) can be used. To get an idea of howgood it can get a simulations tool from Analog Devices has been used1.This has some constraint and because of that a frequency of 206.25 MHzhave been used instead of 644.53125 MHz which is 8

25 ∗ 644.53125 MHz or10.3125 MHz divided with 50 instead of 16. This is because the softwaretool can not handle more than 3 decimals and needs to have an fpd thatis less than 247.5 MHz. The radio frequency that will be used in thesimulations is 491.52 MHz which is the same as 128 ∗ 3.84 MHz.

4.2 Possible solutions

There is one major decision that needs to be done, either to implement thePLL with an integer divider or using a fractional PLL with a fractionaldivided.

The integer divider is the most known one and is used in an integer PLLwhich can be seen in Figure 4.1. The input frequency, fpd, will be comparedwith fref and the output will be filtered and fed a voltage controlled oscillator(VCO) which controls the frequency output. When the system is stable theoutput frequency will be the same as the input frequency multiplied with N ,N is an integer. Figure 4.2 show a typical phase noise graph for an integer

1ADIsimPLL 3.60

29

CHAPTER 4. FREQUENCY RECOVERY

PLL. The top line is the total phase noise and the only one that is of interestin the graph.

Errordetector

Loop

filterVCO

1/N

f outf pd

f ref

Figure 4.1: Block diagram over an integer PLL

Figure 4.2: Phase noise graph for an integer PLL

The other way to implement it will be by using an fractional PLL. Thisis very similar to the integer PLL but have one big difference. Instead ofusing a divider with an integer value N it can use both a value N andN + 1, by changing between them over a period the average will be seen asa fractional value of M . A block diagram of a simple fractional PLL can beseen in Figure 4.3. For example if N = 3 and a period of 10 clock cyclesis used, 3.7 can bee achieved by using the value N (3) for 3 clock cyclesand N + 1 (4) for 7 clock cycles. The output frequency will in this casebe M ∗ fpd. With this kind of implementation, a higher fpd can be usedwithout losing in resolution of the frequencies. This leads to a faster systemand a lower general phase noise but it contributes with another problem.The fractional PLL has spurs in the phase noise. A typical graph over the

30


phase noise can be seen in Figure 4.4. The top line is the total phase noiseand the only one that is of interest in the graph.

Errordetector

Loop

filterVCO

1/M

f out

N

N + 1

f pd

f ref

Figure 4.3: Block diagram over a fractional PLL

Figure 4.4: Phase noise graph for a fractional PLL

By reducing the loop bandwidth and adding an extra pole in the filter thespurs can be reduced. In Figure 4.5 the spurs are reduced with a magnitudeof almost 90 dBc for the worst factional spur. But even if they are reduceda lot they will still exist and may cause problems.

31


Figure 4.5: Phase noise graph for a fractional PLL with reduced spurs

Choosing the best PLL solution is difficult and will be a trade off betweenproperties like, phase noise, spurs and lock time. It is up to the designerto choose what properties that are most important for each application andmake a suitable design according to the wanted properties.

32

Chapter 5

Result

The different choices to implement the PTP have been discussed and toget a high accurate solution PTP need to be implemented with hardwaresupport. The hardware will be located at the XGMII between the PCS andthe MAC unit. Because of the hardware support being necessary for thetimestamp it is convenient to have a two-step synchronization to minimizethe complexity in the hardware and also not affecting the time accuracy.The best solution for forwarding nodes is to implement them as a TC. Ifthe nodes will contain a network element and not only forwarding message,an OC needs to be attached to the TC. When choosing between P2P andE2E they are equally accurate. P2P is recommended because it will effectthe network load less than E2E if more than 3 nodes are used. The workwill also be moved from the master to the slave nodes. An E2E solutionwould also need an extra memory to store the correction time for delaymeasurement which is not necessary with P2P. If the network that is goingto be used is not totally owned by the user and cannot guarantee that allnodes will have a PTP implementation E2E is to prefer, but in that kind ofnetwork the accuracy will be heavily reduced. The interval for both sync andPdelay Req messages will be sent with a ratio of 32 messages/sec. This willlead to that Follow Up, Pdelay Resp and Pdelay Resp Follow Up will sendwith the same ratio. This is a high value to make sure that the messageinterval will not affect the accuracy. The announce message is not thatmuch of interest and the ratio will be set to 1 message/sec. In total thePTP message will occupy the network with

8 ∗ [102 + 32 ∗ (84 + 84) + 32 ∗ (92 + 92 + 92)] = 114480bits/sec

In 10GBASE Ethernet this correspond to 0.001% of the network.The frequency accuracy is preferred to be transferred with SyncE. In

comparison to PTP where only the end nodes need to be implemented withPTP, SyncE needs to have all nodes in the chain implemented with SyncE.If one node is not implemented with this technique the rest of the chain cannot be guaranteed a good synchronization.

33

CHAPTER 5. RESULT

To get an estimation of the time error from when an event message arrivesuntil it will be timestamped all the time errors need to be summed up. Firstof all we have the asymmetry. The asymmetry in the node can be reducedwith only the variable delay in the FIFO with a well designed PCS. The delaywill then be 0.68 ns if the standard width of 16 bits will be used betweenPCS and PMA. Then the asymmetry from the communication medium willbe added but any number will not be presented. The error that occurs atthe resolution of the clock will be half of the clock period. By using therecovered clock of 664.53 MHz the error will be 0.78 ns. With help of SyncEfor frequency synchronization the error will be under 1 ps and can thereforebe neglected. In total the error does not need to be greater than 1.46 ns ifthe asymmetry in the medium is neglected. To achieve this accuracy a welldesigned PCS and FIFO is needed.

It is possible to recover a radio frequency from Ethernet, but the highestcommon frequency is 15 kHz and there is no general way to recover a radiofrequency that suits all applications.

34

Chapter 6

Conclusion

The proposed solution is a bit expensive and if the accuracy is not neededa simpler and cheaper solution can be used. For example two synthesizersare necessary in the proposed solution. One to get a traceable frequency tothe PRC according to SyncE and one to recover a radio frequency from themedium. SyncE also needs to have a more expensive local oscillator thanordinary, non synchronous, Ethernet.

Texas instrument have a PTP device (DP83640) which also have SyncEthat can be activated trough a register. The test result show that by activateSyncE in the test the peek-to-peek time error can be reduced from 119.25 nsto 700 ps [1]. The standard deviation also decrease, from 9.537 ns down to77.5 ps. In comparison to the theoretical result that has been presented inthis report an error of 1.46 ns seem to be a realistic peek-to-peek value. Inthe test the standard deviation is almost one tenth which can give a roughlyestimated standard deviation of 150 ps.

According to the frequency there will always be a trade off and eachapplication needs to decide what is the most important property foreach case. To get a wanted result for the specific application a moredetailed investigation need to be done. There would also be preferablewith measurements and not only simulation and theoretical values.

35

Bibliography

[1] An-1730 dp83640 synchronous ethernet mode: Achieving sub-nanosecond accuracy in ptp applications. http://www.ti.com/lit/

an/snla100a/snla100a.pdf.

[2] Ethernet IEEE 802.3 tutorial - an overview or tutorial of Ethernet,IEEE802.3 used widely for local area network, LAN applications.

[3] IEEE-1588 standard for a precision clock synchronization protocol fornetworked measurement and control systems.

[4] Timing and synchronization aspects in packet networks. http://

handle.itu.int/11.1002/1000/12015.

[5] Timing characteristics of a synchronous ethernet equipment slave clock.http://handle.itu.int/11.1002/1000/10909.

[6] Information technology - open systems interconnection - basic referencemodel: The basic model. Nov 1994.

[7] IEEE standard for a precision clock synchronization protocol fornetworked measurement and control systems. IEEE Std 1588-2008(Revision of IEEE Std 1588-2002), pages c1–269, July 2008.

[8] IEEE standard for ethernet - section 1. IEEE Std 802.3-2012 (Revisionto IEEE Std 802.3-2008), Dec 2012.

[9] IEEE standard for ethernet - section 4. IEEE Std 802.3-2012 (Revisionto IEEE Std 802.3-2008), Dec 2012.

[10] Alexandra Dopplinger and Jim Innis. Using IEEE 1588 forsynchronization of network-connected devices.

[11] J.-L. Ferrant, M. Gilson, S. Jobert, M. Mayer, M. Ouellette, L. Montini,S. Rodrigues, and S. Ruffini. Synchronous ethernet: a methodto transport synchronization. Communications Magazine, IEEE,46(9):126–134, September 2008.

36

http://www.ti.com/lit/an/snla100a/snla100a.pdf

http://www.ti.com/lit/an/snla100a/snla100a.pdf

http://handle.itu.int/11.1002/1000/12015



[12] Jean-Loup Ferrant, Mike Gilson, SA c©bastien Jobert, Michael Mayer,Laurent Montini, Michel Ouellette, Silvana Rodrigues, and StefanoRuffini. Standards in telecom packet networks using synchronousethernet and/or IEEE 1588. Synchronous Ethernet and IEEE 1588in Telecoms, page 329, 2013.

[13] 2 ) Han, J. ( 1 and 2 ) Jeong, D.-K. ( 1. A practical implementationof IEEE 1588-2008 transparent clock for distributed measurementand control systems. IEEE Transactions on Instrumentation andMeasurement, 59(2):433–439, 2010.

[14] D. Mohl and M. Renz. Improved synchronization behavior inhighly cascaded networks. In Precision Clock Synchronization forMeasurement, Control and Communication, 2007. ISPCS 2007. IEEEInternational Symposium on, pages 96–99, Oct 2007.

[15] Michel Ouellette, Ji Kuiwen, Liu Song, and Li Han. Using IEEE 1588and boundary clocks for clock synchronization in telecom networks.IEEE Communications Magazine, 49(2):164 – 171, 2011.

[16] S. Rodrigues. IEEE-1588 and synchronous ethernet in telecom.In Precision Clock Synchronization for Measurement, Control andCommunication, 2007. ISPCS 2007. IEEE International Symposiumon, pages 138–142, Oct 2007.

37

Avdelning, InstitutionDivision, Department

DatumDate

Sprak

Language

� Svenska/Swedish

� Engelska/English

�

RapporttypReport category

� Licentiatavhandling

� Examensarbete

� C-uppsats

� D-uppsats

� Ovrig rapport

�

URL for elektronisk version

ISBN

ISRN

Serietitel och serienummerTitle of series, numbering

ISSN

Linkoping Studies in Science and Technology

Thesis No. 4824

TitelTitle

ForfattareAuthor

SammanfattningAbstract

NyckelordKeywords

§ In this thesis an investigation will be done on how time and frequencycan be synchronized over Ethernet with help of Precision Time Protocol andSynchronous Ethernet. The goal is to achieve a high accuracy in the synchro-nization when a topology of 10 cascaded nodes is used. Different approachesmay be used when implementing Precision Time Protocol for synchronization.They will be investigated and the best approach for a good accuracy will beproposed. Another question that this thesis will cover is how to recover a radiofrequency, a multiple of 3.84 MHz from Ethernets 10.3125 GHz.

By using hardware support for the timestamps and transparent clocks inthe forwarding nodes the best accuracy is achieved for the time and phasesynchronization. Combining this with Synchronous Ethernet for frequencysynchronization, to get a traceable clock through the system, will lead to thebest result. The total error does not need to be greater than 1.46 ns if theasymmetry in the medium is neglected and a well designed PCS and FIFOare used. Recovering the radio frequency from Ethernet is done by using thehighest common frequency, either an integer phase locked loop or a fractionalphase locked loop can be used. The fractional phase locked loop will give abetter result but will contribute with spurs that the integer phase locked loopdoes not.

ISY,Department of Electrical Engineering581 83 Linkoping

Februari 20, 2015

-


-

http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-115882

Februari 20, 2015

Timing and Synchronization over Ethernet

Emil Lundqvist

××

Ethernet, PTP, SyncE, Time & Frequency

timing and synchronization over ethernet797013/fulltext01.pdf · during the thesis. i also want to...

Documents