distributed synchronization
DESCRIPTION
Distributed Synchronization. In single CPU systems Semaphores and monitors Essentially shared memory solutions How about distributed synchronization? Relevant information is scattered Processes make decisions based on local information - PowerPoint PPT PresentationTRANSCRIPT
04/19/23 ICSS741 - Time and Coordination 1
Distributed Synchronization
• In single CPU systems– Semaphores and monitors
– Essentially shared memory solutions
• How about distributed synchronization? – Relevant information is scattered
– Processes make decisions based on local information
– A single point of failure in a system should be avoided
– No common clock or other precise global time source exists
04/19/23 ICSS741 - Time and Coordination 2
What Is Needed
• In order to coordinate events in a distributed system– We may need to know the time at which a particular
event took place– We may need to determine the order in which two
events took place, or should take place, without respect to the time they actually occur
• Synchronization requires either– Global time– Global ordering
04/19/23 ICSS741 - Time and Coordination 3
Time Synchronization
• Is it possible to synchronize all clocks to produce a single, unambiguous time standard?
• Time synchronization need not be absolute– What usually matters is that processes agree on
the order in which events occur (not necessarily the time at which they occur)
04/19/23 ICSS741 - Time and Coordination 4
Time and Coordination
• Basically two problems with time– External synchronization
• Synchronize clocks with an authoritative external source of time
– Internal synchronization• The internal consistency of the clocks is what
matters, not whether they are close to the real time
04/19/23 ICSS741 - Time and Coordination 5
Astronomical Time
• Since the 17th century time has been measured astronomically– The event of the sun reaching the highest point in the sky
is called the transit of the sun– The interval between two consecutive transits of the sun
is called a solar day
• In the 1940s, it was established that the earth’s rotation is not constant– The earth is spinning slower– 300 million years ago there were about 400 days per year
04/19/23 ICSS741 - Time and Coordination 6
Atomic Time
• The atomic clock was invented in 1948– One second is the time it takes the cesium 133
atom to make 9,192,631,770 transitions– Currently about 50 cesium-133 clocks exist– Periodically they are averaged to produce
international atomic time (TAI)– The Bureau International de l’Heure (BIH)
maintains the official clock
04/19/23 ICSS741 - Time and Coordination 7
Leap Seconds
• Currently about 86,400 TAI seconds is about 3msec less than a mean solar day– Not a problem until noon becomes 6am
• BIH solves the problem by inserting leap seconds to compensate for the difference– Leap seconds are added whenever the discrepancy
grows to 800 msec– Power companies will increase their frequencies to
compensate
• UTC (Universal coordinated time) is the result
04/19/23 ICSS741 - Time and Coordination 8
Obtaining Accurate Time
• UTC is an international standard for the current time– WWV shortwave radio from Fort Collins
(accuracy 0.1 – 10 milliseconds)– GEOS satellites (0.1 milliseconds)– GPS satellites (1 millisecond)
04/19/23 ICSS741 - Time and Coordination 9
Physical Clocks
• Computer each contain their own physical clocks– Timer might be a better word…– Utilize crystal that oscillate at a known frequency– A count of the oscillations is maintained– Software typically takes this count, divides it down,
and stores it as a number in a register
• Most systems provide date/time from the counter• Ordering events, in a single machine, with such a
clock is easy– Provided the clock resolution is fine enough
04/19/23 ICSS741 - Time and Coordination 10
Clock Drift
• Crystal-based clocks are subject to drifting– the change in the offset between the clock and a
nominal perfect reference clock per unit of time measured by the reference clock
• Typical drift rates– Quartz crystals – 10-6 (about a difference of one
second every 1,000,000 seconds or 11.6 days)– Atomic clocks – 10-13
04/19/23 ICSS741 - Time and Coordination 11
External Synchronization
• Lets say you have access to a UTC time source• Assume the machine has a timer that causes an
interrupt H times a second– Current clock value is C
– When UTC time is t, the value of the clock on machine p is Cp(t)
– Ideally Cp(t)=t for all p and t (dC/dt should be 1)
04/19/23 ICSS741 - Time and Coordination 12
Maximum Drift Rate
Slow clock
Fast clock
Perfect clock
dC/dt < 1
dC/dt > 1dC/dt = 1
04/19/23 ICSS741 - Time and Coordination 13
Synchronizing Physical Time
• What exactly does it mean to synchronize two clocks?– Clocks inherently suffer from drifting– Assuming clocks can always be precisely synchronized
in unrealistic– Define an acceptable range for the difference in time
reported by two clocks (clock skew)
• A distributed physical clock synchronization service defines, and maintains, a maximum skew throughout the system.
04/19/23 ICSS741 - Time and Coordination 14
The Basic Algorithm
• A wants to read B’s clock1) A sends a request to B2) B records its current clock value3) The clock value is sent back to A4) B’s clock value is adjusted to reflect travel
time5) B’s clock value can now be compared to A’s
• Step 4 is difficult to implement accurately
04/19/23 ICSS741 - Time and Coordination 15
Interesting Question
• So you have to adjust your time– Your clock is slow – move it ahead
– Your clock is fast – move it back?
• Implementations– Slow down your clock so it will continually move
towards the real time
– Speed up your clock so it will move towards the real time
– Just move your clock ahead to the real time
04/19/23 ICSS741 - Time and Coordination 16
Cristian’s Algorithm
• One machine knows the true time
• Periodically each machine sends a request for the current time
T0
T1
Request
CUTC
I, interrupt handling time
Time
Measured with the same clock
04/19/23 ICSS741 - Time and Coordination 17
Transit Time
• Estimating propagation time– ( T1 – T0 ) / 2
– ( T1 – T0 – I ) / 2
– If minimum possible propagation delay is known, the estimate can be made better
• Accuracy can be improved by taking several measurements– Any measurement in which T1 – T0 exceeds a threshold
is discarded (congestion)
04/19/23 ICSS741 - Time and Coordination 18
ICMP Timestamp Request/Reply
Type (17 or 18) Code (0) Checksum
Identifier Sequence Number
32-bit originate timestamp
32-bit receive timestamp
32-bit transmit timestamp
rtt
Same clock so difference is accurate
04/19/23 ICSS741 - Time and Coordination 19
The Berkeley Algorithm
• Time server is active, and polls each machine periodically for its time– Based on the answers, an average time is computed
– A fault-tolerant average is used
– Machines are then told to slow down, or speed up their clocks
• Suitable for systems where no UTC source is available
04/19/23 ICSS741 - Time and Coordination 20
Berkeley Algorithm
Current Time = 740Adjusted TimeA = 730Adjusted TimeB = 742Average = 737
Current Time = 720
Move clock forward 7
Current Time = 737740 740
720
737
+7
Network delay = 10 Network delay = 5
04/19/23 ICSS741 - Time and Coordination 21
Network Time Protocol
• NTP is used to synchronize the time of a computer client to another server or reference time source
• Client accuracies are typically within a millisecond on LANs and up to a few tens of milliseconds on WANs
• NTP configurations utilize multiple redundant servers and diverse network paths in order to achieve high accuracy and reliability
• Configurations can use authentication to prevent accidental or malicious protocol attacks
04/19/23 ICSS741 - Time and Coordination 22
NTP Strata
Primary Servers (stratum 1 ) sync to UTC source
Secondary Servers (stratum 2 ) sync to primary servers
Workstations
04/19/23 ICSS741 - Time and Coordination 23
USNA NTP Time Servers
04/19/23 ICSS741 - Time and Coordination 24
Rules of Engagement
• Clients should avoid using the primary servers whenever possible– In most cases the accuracy of the NTP
secondary (stratum 2) servers is only slightly degraded relative to the primary servers
– As a group, the secondary servers may be just as reliable
04/19/23 ICSS741 - Time and Coordination 25
When to Use a Primary
• As a general rule– The secondary server provides synchronization to a
sizable population of other servers and clients– The server operates with at least two and preferably
three other secondary servers in a common synchronization subnet
– The administration(s) that operates these servers coordinates other servers within the region, in order to reduce the resources required outside that region.
• In order to ensure reliability, clients should spread their use over many different servers
04/19/23 ICSS741 - Time and Coordination 26
NTP Servers
• http://www.ntp.org (home page for NTP)
• List of Primary Servers (100)– http://www.eecis.udel.edu/~mills/ntp/clock1.htm
• List of Secondary Servers (110)– http://www.eecis.udel.edu/~mills/ntp/clock2.htm
• Our server– timehost.cs.rit.edu
04/19/23 ICSS741 - Time and Coordination 27
Synchronization Modes
• Servers synchronize in one of three modes– Multicast
• Used on high speed LANs• Servers periodically broadcast their time• Low accuracies, but efficient
– Procedure-call• Similar to the operation of Cristian’s algorithm
– Symmetric• Used by master servers• Pairs of servers exchange information• Timing data is retained in order to improve accuracy
04/19/23 ICSS741 - Time and Coordination 28
NTP Design Goals
• The four primary design goals of NTP are– Allow accurate UTC synchronization– Enable survival despite significant losses of
connectivity– Allow frequent resynchronization– Protect against malicious or accidental
interference
04/19/23 ICSS741 - Time and Coordination 29
Accurate Synchronization
• NTP provides the following information relative to the primary server– Clock offset
• Difference between the two clocks
– Round-trip delay• Total transmission time for the messages
– Dispersion• Offsets are predicted• Dispersion is a measure of how much the prediction differs
from what what reported• Large dispersion values indicate inaccuracy
04/19/23 ICSS741 - Time and Coordination 30
Logical Clocks
• Since physical clocks cannot be perfectly synchronized across a distributed system– Physical time cannot be used to determine the
order in which events occur
• Logical clocks can be used to order events within a distributed system
• The essence of a logical clock is the happens-before relationship
04/19/23 ICSS741 - Time and Coordination 31
Happens-Before
• The happens-before relationship is denoted a b– If a and b are events in the same process, and a occurs
before b, then a b
– If a is the event of a message being sent by one process, and b is the event of the message being received by another process, then a b
– If a b, and b c, then a c
• Any two events that are not in a happen-before relationship are concurrent
04/19/23 ICSS741 - Time and Coordination 32
Events
p1
p2
p3
a b
c d
e f
m1
m2
Physicaltime
04/19/23 ICSS741 - Time and Coordination 33
Lamport’s Logical Clock
• To obtain logical ordering, timestamps that are independent of physical clocks are used
• Lamport clocks follow these rules– Each process increments it clock between every
two consecutive events– If a sends a message to b, the message includes
T(a). Upon receipt, b sets its clock to the greater of T(a)+1 and the current clock
04/19/23 ICSS741 - Time and Coordination 34
Lamport’s Algorithm012345678910
08162432404856647280
0102030405060708090100
08162432404861697785
0102030405060708090100
a a
b
cd
A(6) F(10) B(24) C(50) D(60) E(64)
e
f
A(6) B(24) C(50) D(60) E(64) F(71)
b
c
d
e
f
0123456787071
04/19/23 ICSS741 - Time and Coordination 35
Partial Ordering
• If a b then L(a) < L(b)
• Note that– L(d) < L(e)
• Does not imply that– d e– Since d and e might be concurrent
• Plus L(a) might equal L(b)
04/19/23 ICSS741 - Time and Coordination 36
Example
a b
c d
e f
m1
m2
21
3 4
51
p1
p2
p3
Physical time
04/19/23 ICSS741 - Time and Coordination 37
Total Ordering
• Between every event the clock must tick at least once
• Since events cannot happen at the same time, attach the process number to the low-order end of the time, separated by a decimal point
• Now– If a happens before b in the same process, C(a) < C(b)– If a and b represent the sending an receiving of a
message, C(a) < C(b)– For all events a and b, C(a) is not equal to C(b)
04/19/23 ICSS741 - Time and Coordination 38
Vector Clocks
• Vector clocks were designed to overcome the shortcomings of Lamport’s clocks– A vector clock is an array of times
• The rules:– Initially, Vi[j]=0, for i,j = 1,2 …, N– Just before pi timestamps an event, it increments Vi[i]– pi includes the value t = Vi in every message it sends– When pi receives a timestamp in as message, it takes
the component-wise maximum of the two vector timestamps
04/19/23 ICSS741 - Time and Coordination 39
Example
a b
c d
e f
m1
m2
(2,0,0)(1,0,0)
(2,1,0) (2,2,0)
(2,2,2)(0,0,1)
p1
p2
p3
Physical time
04/19/23 ICSS741 - Time and Coordination 40
Comparing Timestamps
• Vector timestamps are compared as follows– V = V’ iff V[j] = V’[j] for j=1,2,…,N– V <= V’ iff V[j] <= V’[j] for j=1,2…,N– V < V’ iff V<=V’ and V != V’
• So what?– If V(e) < V(e’) then ee’– c and e are concurrent since neither V(c) <=
V(e) nor V(e)<=V(c)