ntp architecture, protocol and algorithmsmills/database/brief/arch/arch.pdfntp architecture,...
TRANSCRIPT
NTP Architecture, Protocoland Algorithms
Sir John Tenniel; Alice’s Adventures in Wonderland,Lewis Carroll
22-Jul-07 1
David L. MillsUniversity of Delawarehttp://www.eecis.udel.edu/~millsmailto:[email protected]
Process decomposition
RemoteServers
Server 1
Server 2
Peer/Poll1
Server 3
Peer/Poll2
Peer/Poll3
Selectionand
ClusteringAlgorithms
CombiningAlgorithm
Loop Filter
VFO
Clock DisciplineProcess
SystemProcess
Peer/PollProcesses
Clock Adjust
22-Jul-07 2
o Peer process runs when a packet is received.
o Poll process sends packets at intervals determined by the clock discipline process and remote server.
o System process runs when a new peer process update is received.
o Clock discipline process runs at intervals determined by the measured network phase jitter and clock oscillator (VFO) frequency wander.
o Clock adjust process runs at intervals of one second.
Clock AdjustProcess
NTP Protocol Header Format (32 bits)
NTP protocol header and timestamp formats
Strat PollLI ModeVN
Root Delay
Root Dispersion
Reference Identifier
Reference Timestamp (64)
Originate Timestamp (64)
Receive Timestamp (64)
Transmit Timestamp (64)
LI leap warning indicatorVN version number (4)Strat stratum (0-15)Poll poll interval (log2)Prec precision (log2)
Seconds (32) Fraction (32)
NTP Timestamp Format (64 bits)
Value is in seconds and fractionsince 0h 1 January 1900Cryptosum
Prec
NTPv4 Extension Field
22-Jul-07 3
NTP v3 and v4
Transmit Timestamp (64)
Message Hash (64 or 128)
Authenticator uses DES-CBC or MD5 cryptosumof NTP header plus extension fields (NTPv4)
Key/Algorithm IdentifierAuthenticator
(Optional)
Extension Field 1 (optional)
Extension Field 2… (optional)
NTP v4 only
Extension Field(padded to 32-bit boundary)
Field Length Field Type
NTPv4 Extension Field
Last field padded to 64-bit boundary
authentication only
NTP packet header format
Strat PollLI ModeVN
Root Delay
Root Dispersion
Reference Identifier
Reference Timestamp (64)
Originate Timestamp (64)
Prec
leap leap indicator (LI)version version number (VN)mode protocol modestratum stratumτ poll interval (log2 s)ρ clock reading precision (log2 s)Δ root delayΕ root dispersion
Packet headerVariables Description
22-Jul-07 4
Receive Timestamp (64)
Transmit Timestamp (64)
Ε root dispersionrefid reference IDreftime reference timestampT1 originate timestamp T2 receive timestamp T3 transmit timestampT4 destination timestamp*MAC MD5 message hash (optional)
MAC (optional 160)
* Strictly speaking, T4 is not a packet variable; it is the value of the system clock upon arrival.
NTP date and timestamp formats and important dates
Year M D JDN NTP Date Era Timestamp-4712 1 1 0 -208,657,814,400 -49 1,795,583,104 First day Julian Era
1 1 1 1,721,426 -59,926,608,000 -14 202,934,144 First day Common Era1582 10 15 2,299,161 -10,010,304,000 -3 2,874,597,888 First day Gregorian Era1900 1 1 2,415,021 0 0 0 First day NTP Era 01970 1 1 2,440,588 2,208,988,800 0 2,208,988,800 First day Unix Era1972 1 1 2,441,318 2,272,060,800 0 2,272,060,800 First day UTC2000 1 1 2,451,545 3,155,673,600 0 3,155,673,600 First day 21st century2036 2 7 2,464,731 4,294,944,000 0 4,294,944,000 Last day NTP Era 02036 2 8 2,464,732 4,295,030,400 1 63,104 First day NTP Era 1
22-Jul-07 5
2036 2 8 2,464,732 4,295,030,400 1 63,104 First day NTP Era 13000 1 1 2,816,788 34,712,668,800 8 352,930,432 4294967296
Era (32) Seconds (32) Fraction (32)
Seconds (64) Fraction (32 or 64)
NTP Date (signed, twos-complement, 128-bit integer)
NTP Timestamp (unsigned 64-bit integer)Era Number
Process decomposition
RemoteServers
Server 1
Server 2
Peer/Poll1
Server 3
Peer/Poll2
Peer/Poll3
Selectionand
ClusteringAlgorithms
CombiningAlgorithm
Loop Filter
VFO
Clock DisciplineProcess
SystemProcess
Peer/PollProcesses
Clock Adjust
22-Jul-07 6
o Peer process runs when a packet is received.
o Poll process sends packets at intervals determined by the clock discipline process and remote server.
o System process runs when a new peer process update is received.
o Clock discipline process runs at intervals determined by the measured network phase jitter and clock oscillator (VFO) frequency wander.
o Clock adjust process runs at intervals of one second.
Clock AdjustProcess
NTP one-step on-wire protocol
0
0
t2 = clock
t3t4t5t1
T5 = t5T6
t5<>T1?
T6 = t6
org
rec
T1 = t1T2
t1<>0?
T2 = t2
t2 = rec t6 = rec
t5 = org
t3t2 t6 t7
t6 = clock
T3 = clock T7 = clock
t1 = org
org originate timestamp rec receive timestampxmt transmit timestamp
PacketVariables
Peer B
StateVariables
State VariablesName Descriptiont7 = xmit
t3==T3?xmt
t3 = xmit
0
tn originate timestamptn+1 receive timestamp
Packet Header VariablesName Description
t2 t3 t6 t7
22-Jul-07 7
0
0
t5t6t7
org
rec
T3 = t3T4
t7 <> T3?
T8 = t8
t4 = rec
t3 = org
0
0 t1t2t3
t3 <> 0?
T4 = t4
t4t1 t5 t8
T1 = clock T5 = clock
t4 = clock t8 = clock
PacketVariables
Peer A
StateVariables
t1 = xmit
xmt
t5 = xmit
t5 == T5?t1 == T1?
tn+1 receive timestamptn+2 transmit timestamptn+3 destination timestamp
t7 <> T3?
t5 == T5?
org Duplicate Test
xmt Bogus Test
t1 t4 t5 t8
Timestamp interleaving
t4
t3t2
t1 t5
t6
t8
t7
t12
t11t10
t9
t4
t2
t1
t6
t8
t3
t12
t7t10
t5
One Step
Two Step
o In the diagrams, transmit timestamps carry odd numbers, receive timestamps carry even numbers.
• Receive timestamps are available immediately.
• In one-step mode, transmit timestamps are conveyed in the transmitted packet.
• In two-step mode, transmit timestamps are conveyed in the following transmitted packet.
• It takes two roundtrips to accumulate all four timestamps
22-Jul-07 8
NTP two-step on-wire protocol
t1 t4 t8t5 t5t6t4 = rec
t3 = org
0
0 0
t20
t4 = clock t8 = clock
t5 = xmit0
t8t5 t9t10
t11(T7)
t8 = rec
t7 = org
t12 = clock
t9 = xmit
t4
t3t2
t1 t5
t6
t8
t7
t12
t11t10
t9
t1 t4 t5 t8t1 t4 t5 t8 t9 t12
t7 (T3)PacketVariables
22-Jul-07 9
T1= clock T9 = clock
StateVariables
0
porg
prec
T1
T2 = t2T4
T2
T5
T6
pxmt T4 = t4
T1
T8
0
0
rec
T3 = t3T4
t7 <> T3?
T8 = t8
t3 <> 0?
T4 = t4
T5 = clock
0
xmt t5 == T1?t1 == T1?
T8
T6
T9
T6
T5
T8
T7 = t7T8
t11 <> T7?
T12 = t12
t9 == T5?
org
0
( )[ ]4312 )(2
1θ TTTT −+−=
( ) ( )2314δ TTTT −−−=( )[ ]8756 )(
21θ TTTT −+−=
( ) ( )6758δ TTTT −−−=
Round 1Round 2
Round 3
ExtendedStateVariables
IEEW 1588 (PTP) master-slave protocol
T4
T3T2
T1Master
Slave
SyncFollow_Up (T1)
Delay_Req
Delay_Resp(t4)
o Ethernet NIC hardware strikes a timestamp after the preamble and before the data separately for transmit and receive.
22-Jul-07 10
before the data separately for transmit and receive.
o In each round master sends Sync message at T1; slave receives at T2.
o In one-step variant T1 is inserted just before the data in the Sync message; in two-step variant t1 is sent later in a Follow_Up message.
o Slave sends Delay_Req message at T3; master sends Delay_Respmessage with T4. Compute master offset θ and roundtrip delay δ:
( )[ ]4312 )(21θ TTTT −+−= ( ) ( )2314δ TTTT −−−=
Transition matrix
Mode ACTIVE PASSIVE CLIENT SERVER BCAST
NO_PEER NEWPS FXMIT NEWMC NEWBC
ACTIVE PROC PROC
PASSIVE PROC ERROR
Packet Mode
Mod
e
22-Jul-07 11
PASSIVE PROC ERROR
CLIENT PROC
SERVER
BCAST
BCLIENT ERROR PROC
Ass
ocia
tion
M
The default (empty box) behavior is to discard the packet without comment.
Packet sanity tests
Test Comment Code Condition RoutinePacket Flashers
drop implementation error none T 3 = 0 or (T 1 = 0 and T 2 ≠ 0) or (T 1 ≠ 0 and T 2 = 0) receive
1 duplicate packet pkt_dupe T 3 = xmt receive
2 bogus packet pkt_bogus T 1 ≠ org receive
3 invalid timestamp pkt_proto mode ≠ BCST and T 1 = 0 and T 2 = 0 receive4 access denied pkt_denied access restricted, untrusted key, etc. receive5 authentication error pkt_auth MD5 message hash fails to match message digest receive6 peer not synchronized pkt_unsync leap = 11 or stratum >= MAXSTRAT or T 3 < reftime packet
7 invalid distance pkt_dist Δ R < 0 or E R < 0 or Δ R / 2 + E R > MAXDISP packet
22-Jul-07 12
R R R R
8 autokey keystream error pkt_autokey MD5 autokey hash fails to match previous key ID receive9 autokey protocol error pkt_crypto key mismatch, certificate expired, etc. receive
Peer Flashers10 peer stratum exceeded peer_stratum stratum > sys_stratum in non-symmetric mode accept11 peer distance exceeded peer_dist distance greater than MAXDIST accept12 peer synchronization loop peer_loop peer is synchronized to this host accept13 peer unfit for synchronization peer_unfit unreachable, unsynchronized, noselect accept
T1
T3T2
T4
o The most accurate offset θ0 is measured at the lowest delay δ0 (apex of
)()( 2314 TTTT −−−=δ)]()[(2
14312 TTTT −+−=θ
Server
Client
Clock filter algorithm
x
θ0
22-Jul-07 13
o The most accurate offset θ0 is measured at the lowest delay δ0 (apex of the wedge scattergram).
o The correct time θ must lie within the wedge θ0 ± (δ − δ0)/2.
o The δ0 is estimated as the minimum of the last eight delay measurements and (θ0 ,δ0) becomes the peer update.
o Each peer update can be used only once and must be more recent than the previous update.
Clock filter performance
−3
−2
−1
0
1
2
3
4
Offset (ms)
−3
−2
−1
0
1
2
3
4
Offset (ms)
22-Jul-07 14
0 5 10 15 20−4
Time (hr)0 5 10 15 20
−4
Time (hr)
o Left figure shows raw time offsets measured for a typical path over a 24-hour period (mean error 724 μs, median error 192 μs)
o Right graph shows filtered time offsets over the same period (mean error 192 μs, median error 112 μs).
o The mean error has been reduced by 11.5 dB; the median error by 18.3 dB. This is impressive performance.
Correct NTPCorrect DTS
Clock select principles
o The correctness interval for any candidate is the set of points in the interval of length twice the synchronization distance centered at the computed offset.
C
A
B
D
correctness interval = q - l £ q0 £ q + l m = number of clocksf = number of presumed falsetickersA, B, C are truechimersD is falseticker
22-Jul-07 15
computed offset.
o The DTS interval contains points from the largest number of correctness intervals, i.e., the intersection of correctness intervals.
o The NTP interval includes the DTS interval, but requires that the computed offset for each candidate is contained in the interval.
o Formal correctness assertions require at least half the candidates be in the NTP interval. If not, no candidate can be considered a truechimer.
system process: select algorithm
Set the number of midpoints d = 0. Set c = 0. Scan from lowest endpoint to highest. Add one to c for every lowpoint, subtract one for every highpoint, add one to d for every midpoint. If c ≥ m − f, stop; set l = current lowpoint
Set c = 0. Scan from highest endpoint to lowest. Add one to c for every highpoint, subtract one for every lowpoint, add one to d for every midpoint. If
Consider the lowpoint, midpoint and highpoint of these intervals. Sort these values in a list from lowest to highest. Set the number of falsetickers f = 0.
For each of m associations construct a correctness interval[θ – rootdist, θ + rootdist]
22-Jul-07 16
no
highpoint, subtract one for every lowpoint, add one to d for every midpoint. If c ≥ m − f, stop; set u = current highpoint.
Add one to f. Is f < m / 2?
Failure; a majority clique could not be found..
Success; the intersection interval is [l, u].
yes
If d ≤ f and l < u?
no
yes
Cluster principles
ϕS1
ϕR1
ϕR2
ϕR3
ϕR4
ϕR2
ϕR3
ϕR4
ϕS3
peer jitter
select jitter
22-Jul-07 17
o Candidate 1 is further from the others, so its select jitter ϕS1 is highest.
o (a) ϕmax = ϕS1 and ϕmin = ϕR2. Since ϕmax > ϕmin, the algorithm prunes candidate 1 to reduce select jitter and continues.
o (b) ϕmax = ϕS3 and ϕmin = ϕR2. Since ϕmax < ϕmin, pruning additional candidates will not reduce select jitter. So, the algorithm ends with ϕR2, ϕR3 and ϕR4 as survivors.
a b
system process: cluster algorithm
For each candidate compute the selection jitter ϕS (RMS peer offset differences between this and all other candidates).
Select ϕ as the candidate with maximum Λϕ .
Let (θ, ϕR, Λ) represent a candidate with peer offset θ, jitter ϕ and a weight factor Λ equal to stratum as the high order field and root
distance as the low order field.
Sort the candidates by increasing Λ. Let n be the number of candidates and nmin ≤ n the minimum number of survivors.
22-Jul-07 18
Select ϕmax as the candidate with maximum ΛϕS.
Delete the outlyer candidate with ϕmax; reduce n by one.
Done. The remaining cluster survivors are the pick of the litter.
no
yes
Select ϕmin as the candidate with minimum ϕ.
ϕmax < ϕmin or n ≤ nmin or ϕmax is prefer peer?
NTP dataflow analysis
o Each server provides delay Δ and dispersion Ε relative to the root of the synchronization subtree.
Server 1Δ, Ε
Server 2Δ, Ε
Peer 1θ, δ, ε, ϕ
Server 3Δ, Ε
Peer 2θ, δ, ε , ϕ
Peer 3θ, δ, ε , ϕ
Selectionand
CombiningAlgorithms
SystemΘ, Δ, Ε, ϑ
22-Jul-07 19
synchronization subtree.
o As each NTP message arrives, the peer process updates peer offset θ, delay δ, dispersion ε and jitter ϕ.
o At system poll intervals, the clock selection and combining algorithms updates system offset Θ, delay Δ, dispersion Ε and jitter ϑ.
o Dispersions ε and Ε increase with time at a rate depending on specified frequency tolerance φ.
Error budget - notationo System variables
Θ clock offset
Δ root delay
Ε root dispersion
ϕs selection jitter
ϕ jitter
τ interval since last update
m number of peers
o Constants (peers A and B)
ρ maximum reading error
φ maximum frequency error
w dispersion normalize: 0.5
o Packet variables
ΔB peer root delay
ΕB peer root dispersion
o Sample variables
22-Jul-07 20
o Peer variables
θ clock offset
δ roundtrip delay
ε dispersion
ϕr filter jitter
n number of filter stages
τ interval since last update
o Sample variables
T1, T2, T3, T4 protocol timestamps
x clock offset
y roundtrip delay
z dispersion
τ interval since last update
Definitions
o Precision: elapsed time to read the system clock from userland.
o Resolution: significant bits of the timestamp fraction.
o Maximum error: maximum error due all causes (see error budget).
o Offset: estimated time offset relative to the server time.
o Jitter: exponential average of first-order time differences
o Frequency: estimated frequency offset relative to UTC.
22-Jul-07 21
o Frequency: estimated frequency offset relative to UTC.
o Wander: exponential average of first-order frequency differences.
o Dispersion: maximum error due oscillator frequency tolerance.
o Root delay: accumulated roundtrip delay via primary server.
o Root dispersion: accumulated total dispersion from primary server.
o Estimated error: RMS accumulation from all causes (see error budget).
Time values and computations
Δ=ΔB ρ=ρB
∑ +=i
iiz12
ε
)]()[( 431221 TTTTx −+−=
)()( 2314 TTTTy −−−=
)(ρρ 140 TTz R −Φ++=
Σ
μ1 Φ+=+ ii zz
)θcombine(Θ j=
δΔΔ += R
θμεΕΕ ++Φ++= ϕRΣ
Packet Variables Peer Variables
Client
System Variables
21 )( jj
S mθθϕ −= ∑
0δ y=
0θ x=
20
1 )( ii
xxn
−= ∑ϕ
22-Jul-07 22
Client
Σ
22Sϕϕϑ +=
μΕΕ Φ+=R ΔΔ =R
Server
o Packet variables are computed directly from the packet header.
o Peer variables are groomed by the clock filter.
o System variables are groomed from the available peers.
Rρ
Clock discipline algorithm
Vd
Vc Phase/FreqPrediction
Clock Filter
ClockAdjust
PhaseDetector
VFO
Vs
θr+
θc−NTP
Loop Filter
x
y
22-Jul-07 23
o Vd is a function of the phase difference between NTP and the VFO.
o Vs depends on the stage chosen on the clock filter shift register.
o x and y are the phase update and frequency update, respectively, computed by the prediction functions.
o Clock adjust process runs once per second to compute Vc, which controls the frequency of the local clock oscillator.
o VFO phase is compared to NTP phase to close the feedback loop.
NTPθr+
θo−VsVd
Vc
y
NTP clock discipline with PPS steering
Loop Filter
Clock Filter
FrequencyEstimator
PhaseDetector
VFO
PPS
22-Jul-07 24
o NTP daemon disciplines variable frequency oscillator (VFO) phase Vc
relative to accurate and reliable network sources.
o Kernel disciplines VFO frequency y to pulse-per-second (PPS) signal.
o Clock accuracy continues to be disciplined even if NTP daemon or sources fail.
o In general, the accuracy is only slightly degraded relative to a local reference source.
Measured PPS time error for Alpha 433
22-Jul-07 25
Standard error 51.3 ns
Further information
o NTP home page http://www.ntp.org
• Current NTP Version 3 and 4 software and documentation
• FAQ and links to other sources and interesting places
o David L. Mills home page http://www.eecis.udel.edu/~mills
• Papers, reports and memoranda in PostScript and PDF formats
• Briefings in HTML, PostScript, PowerPoint and PDF formats
• Collaboration resources hardware, software and documentation
22-Jul-07 26
• Collaboration resources hardware, software and documentation
• Songs, photo galleries and after-dinner speech scripts
o Udel FTP server: ftp://ftp.udel.edu/pub/ntp
• Current NTP Version software, documentation and support
• Collaboration resources and junkbox
o Related projects http://www.eecis.udel.edu/~mills/status.htm
• Current research project descriptions and briefings