distributed synchronization

40
06/18/22 ICSS741 - Time and Coordi nation 1 Distributed Synchronization In single CPU systems Semaphores and monitors Essentially shared memory solutions How about distributed synchronization? Relevant information is scattered Processes make decisions based on local information A single point of failure in a system should be avoided No common clock or other precise global time source exists

Upload: nicolette-fontaine

Post on 31-Dec-2015

36 views

Category:

Documents


3 download

DESCRIPTION

Distributed Synchronization. In single CPU systems Semaphores and monitors Essentially shared memory solutions How about distributed synchronization? Relevant information is scattered Processes make decisions based on local information - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Distributed Synchronization

04/19/23 ICSS741 - Time and Coordination 1

Distributed Synchronization

• In single CPU systems– Semaphores and monitors

– Essentially shared memory solutions

• How about distributed synchronization? – Relevant information is scattered

– Processes make decisions based on local information

– A single point of failure in a system should be avoided

– No common clock or other precise global time source exists

Page 2: Distributed Synchronization

04/19/23 ICSS741 - Time and Coordination 2

What Is Needed

• In order to coordinate events in a distributed system– We may need to know the time at which a particular

event took place– We may need to determine the order in which two

events took place, or should take place, without respect to the time they actually occur

• Synchronization requires either– Global time– Global ordering

Page 3: Distributed Synchronization

04/19/23 ICSS741 - Time and Coordination 3

Time Synchronization

• Is it possible to synchronize all clocks to produce a single, unambiguous time standard?

• Time synchronization need not be absolute– What usually matters is that processes agree on

the order in which events occur (not necessarily the time at which they occur)

Page 4: Distributed Synchronization

04/19/23 ICSS741 - Time and Coordination 4

Time and Coordination

• Basically two problems with time– External synchronization

• Synchronize clocks with an authoritative external source of time

– Internal synchronization• The internal consistency of the clocks is what

matters, not whether they are close to the real time

Page 5: Distributed Synchronization

04/19/23 ICSS741 - Time and Coordination 5

Astronomical Time

• Since the 17th century time has been measured astronomically– The event of the sun reaching the highest point in the sky

is called the transit of the sun– The interval between two consecutive transits of the sun

is called a solar day

• In the 1940s, it was established that the earth’s rotation is not constant– The earth is spinning slower– 300 million years ago there were about 400 days per year

Page 6: Distributed Synchronization

04/19/23 ICSS741 - Time and Coordination 6

Atomic Time

• The atomic clock was invented in 1948– One second is the time it takes the cesium 133

atom to make 9,192,631,770 transitions– Currently about 50 cesium-133 clocks exist– Periodically they are averaged to produce

international atomic time (TAI)– The Bureau International de l’Heure (BIH)

maintains the official clock

Page 7: Distributed Synchronization

04/19/23 ICSS741 - Time and Coordination 7

Leap Seconds

• Currently about 86,400 TAI seconds is about 3msec less than a mean solar day– Not a problem until noon becomes 6am

• BIH solves the problem by inserting leap seconds to compensate for the difference– Leap seconds are added whenever the discrepancy

grows to 800 msec– Power companies will increase their frequencies to

compensate

• UTC (Universal coordinated time) is the result

Page 8: Distributed Synchronization

04/19/23 ICSS741 - Time and Coordination 8

Obtaining Accurate Time

• UTC is an international standard for the current time– WWV shortwave radio from Fort Collins

(accuracy 0.1 – 10 milliseconds)– GEOS satellites (0.1 milliseconds)– GPS satellites (1 millisecond)

Page 9: Distributed Synchronization

04/19/23 ICSS741 - Time and Coordination 9

Physical Clocks

• Computer each contain their own physical clocks– Timer might be a better word…– Utilize crystal that oscillate at a known frequency– A count of the oscillations is maintained– Software typically takes this count, divides it down,

and stores it as a number in a register

• Most systems provide date/time from the counter• Ordering events, in a single machine, with such a

clock is easy– Provided the clock resolution is fine enough

Page 10: Distributed Synchronization

04/19/23 ICSS741 - Time and Coordination 10

Clock Drift

• Crystal-based clocks are subject to drifting– the change in the offset between the clock and a

nominal perfect reference clock per unit of time measured by the reference clock

• Typical drift rates– Quartz crystals – 10-6 (about a difference of one

second every 1,000,000 seconds or 11.6 days)– Atomic clocks – 10-13

Page 11: Distributed Synchronization

04/19/23 ICSS741 - Time and Coordination 11

External Synchronization

• Lets say you have access to a UTC time source• Assume the machine has a timer that causes an

interrupt H times a second– Current clock value is C

– When UTC time is t, the value of the clock on machine p is Cp(t)

– Ideally Cp(t)=t for all p and t (dC/dt should be 1)

Page 12: Distributed Synchronization

04/19/23 ICSS741 - Time and Coordination 12

Maximum Drift Rate

Slow clock

Fast clock

Perfect clock

dC/dt < 1

dC/dt > 1dC/dt = 1

Page 13: Distributed Synchronization

04/19/23 ICSS741 - Time and Coordination 13

Synchronizing Physical Time

• What exactly does it mean to synchronize two clocks?– Clocks inherently suffer from drifting– Assuming clocks can always be precisely synchronized

in unrealistic– Define an acceptable range for the difference in time

reported by two clocks (clock skew)

• A distributed physical clock synchronization service defines, and maintains, a maximum skew throughout the system.

Page 14: Distributed Synchronization

04/19/23 ICSS741 - Time and Coordination 14

The Basic Algorithm

• A wants to read B’s clock1) A sends a request to B2) B records its current clock value3) The clock value is sent back to A4) B’s clock value is adjusted to reflect travel

time5) B’s clock value can now be compared to A’s

• Step 4 is difficult to implement accurately

Page 15: Distributed Synchronization

04/19/23 ICSS741 - Time and Coordination 15

Interesting Question

• So you have to adjust your time– Your clock is slow – move it ahead

– Your clock is fast – move it back?

• Implementations– Slow down your clock so it will continually move

towards the real time

– Speed up your clock so it will move towards the real time

– Just move your clock ahead to the real time

Page 16: Distributed Synchronization

04/19/23 ICSS741 - Time and Coordination 16

Cristian’s Algorithm

• One machine knows the true time

• Periodically each machine sends a request for the current time

T0

T1

Request

CUTC

I, interrupt handling time

Time

Measured with the same clock

Page 17: Distributed Synchronization

04/19/23 ICSS741 - Time and Coordination 17

Transit Time

• Estimating propagation time– ( T1 – T0 ) / 2

– ( T1 – T0 – I ) / 2

– If minimum possible propagation delay is known, the estimate can be made better

• Accuracy can be improved by taking several measurements– Any measurement in which T1 – T0 exceeds a threshold

is discarded (congestion)

Page 18: Distributed Synchronization

04/19/23 ICSS741 - Time and Coordination 18

ICMP Timestamp Request/Reply

Type (17 or 18) Code (0) Checksum

Identifier Sequence Number

32-bit originate timestamp

32-bit receive timestamp

32-bit transmit timestamp

rtt

Same clock so difference is accurate

Page 19: Distributed Synchronization

04/19/23 ICSS741 - Time and Coordination 19

The Berkeley Algorithm

• Time server is active, and polls each machine periodically for its time– Based on the answers, an average time is computed

– A fault-tolerant average is used

– Machines are then told to slow down, or speed up their clocks

• Suitable for systems where no UTC source is available

Page 20: Distributed Synchronization

04/19/23 ICSS741 - Time and Coordination 20

Berkeley Algorithm

Current Time = 740Adjusted TimeA = 730Adjusted TimeB = 742Average = 737

Current Time = 720

Move clock forward 7

Current Time = 737740 740

720

737

+7

Network delay = 10 Network delay = 5

Page 21: Distributed Synchronization

04/19/23 ICSS741 - Time and Coordination 21

Network Time Protocol

• NTP is used to synchronize the time of a computer client to another server or reference time source

• Client accuracies are typically within a millisecond on LANs and up to a few tens of milliseconds on WANs

• NTP configurations utilize multiple redundant servers and diverse network paths in order to achieve high accuracy and reliability

• Configurations can use authentication to prevent accidental or malicious protocol attacks

Page 22: Distributed Synchronization

04/19/23 ICSS741 - Time and Coordination 22

NTP Strata

Primary Servers (stratum 1 ) sync to UTC source

Secondary Servers (stratum 2 ) sync to primary servers

Workstations

Page 23: Distributed Synchronization

04/19/23 ICSS741 - Time and Coordination 23

USNA NTP Time Servers

Page 24: Distributed Synchronization

04/19/23 ICSS741 - Time and Coordination 24

Rules of Engagement

• Clients should avoid using the primary servers whenever possible– In most cases the accuracy of the NTP

secondary (stratum 2) servers is only slightly degraded relative to the primary servers

– As a group, the secondary servers may be just as reliable

Page 25: Distributed Synchronization

04/19/23 ICSS741 - Time and Coordination 25

When to Use a Primary

• As a general rule– The secondary server provides synchronization to a

sizable population of other servers and clients– The server operates with at least two and preferably

three other secondary servers in a common synchronization subnet

– The administration(s) that operates these servers coordinates other servers within the region, in order to reduce the resources required outside that region.

• In order to ensure reliability, clients should spread their use over many different servers

Page 26: Distributed Synchronization

04/19/23 ICSS741 - Time and Coordination 26

NTP Servers

• http://www.ntp.org (home page for NTP)

• List of Primary Servers (100)– http://www.eecis.udel.edu/~mills/ntp/clock1.htm

• List of Secondary Servers (110)– http://www.eecis.udel.edu/~mills/ntp/clock2.htm

• Our server– timehost.cs.rit.edu

Page 27: Distributed Synchronization

04/19/23 ICSS741 - Time and Coordination 27

Synchronization Modes

• Servers synchronize in one of three modes– Multicast

• Used on high speed LANs• Servers periodically broadcast their time• Low accuracies, but efficient

– Procedure-call• Similar to the operation of Cristian’s algorithm

– Symmetric• Used by master servers• Pairs of servers exchange information• Timing data is retained in order to improve accuracy

Page 28: Distributed Synchronization

04/19/23 ICSS741 - Time and Coordination 28

NTP Design Goals

• The four primary design goals of NTP are– Allow accurate UTC synchronization– Enable survival despite significant losses of

connectivity– Allow frequent resynchronization– Protect against malicious or accidental

interference

Page 29: Distributed Synchronization

04/19/23 ICSS741 - Time and Coordination 29

Accurate Synchronization

• NTP provides the following information relative to the primary server– Clock offset

• Difference between the two clocks

– Round-trip delay• Total transmission time for the messages

– Dispersion• Offsets are predicted• Dispersion is a measure of how much the prediction differs

from what what reported• Large dispersion values indicate inaccuracy

Page 30: Distributed Synchronization

04/19/23 ICSS741 - Time and Coordination 30

Logical Clocks

• Since physical clocks cannot be perfectly synchronized across a distributed system– Physical time cannot be used to determine the

order in which events occur

• Logical clocks can be used to order events within a distributed system

• The essence of a logical clock is the happens-before relationship

Page 31: Distributed Synchronization

04/19/23 ICSS741 - Time and Coordination 31

Happens-Before

• The happens-before relationship is denoted a b– If a and b are events in the same process, and a occurs

before b, then a b

– If a is the event of a message being sent by one process, and b is the event of the message being received by another process, then a b

– If a b, and b c, then a c

• Any two events that are not in a happen-before relationship are concurrent

Page 32: Distributed Synchronization

04/19/23 ICSS741 - Time and Coordination 32

Events

p1

p2

p3

a b

c d

e f

m1

m2

Physicaltime

Page 33: Distributed Synchronization

04/19/23 ICSS741 - Time and Coordination 33

Lamport’s Logical Clock

• To obtain logical ordering, timestamps that are independent of physical clocks are used

• Lamport clocks follow these rules– Each process increments it clock between every

two consecutive events– If a sends a message to b, the message includes

T(a). Upon receipt, b sets its clock to the greater of T(a)+1 and the current clock

Page 34: Distributed Synchronization

04/19/23 ICSS741 - Time and Coordination 34

Lamport’s Algorithm012345678910

08162432404856647280

0102030405060708090100

08162432404861697785

0102030405060708090100

a a

b

cd

A(6) F(10) B(24) C(50) D(60) E(64)

e

f

A(6) B(24) C(50) D(60) E(64) F(71)

b

c

d

e

f

0123456787071

Page 35: Distributed Synchronization

04/19/23 ICSS741 - Time and Coordination 35

Partial Ordering

• If a b then L(a) < L(b)

• Note that– L(d) < L(e)

• Does not imply that– d e– Since d and e might be concurrent

• Plus L(a) might equal L(b)

Page 36: Distributed Synchronization

04/19/23 ICSS741 - Time and Coordination 36

Example

a b

c d

e f

m1

m2

21

3 4

51

p1

p2

p3

Physical time

Page 37: Distributed Synchronization

04/19/23 ICSS741 - Time and Coordination 37

Total Ordering

• Between every event the clock must tick at least once

• Since events cannot happen at the same time, attach the process number to the low-order end of the time, separated by a decimal point

• Now– If a happens before b in the same process, C(a) < C(b)– If a and b represent the sending an receiving of a

message, C(a) < C(b)– For all events a and b, C(a) is not equal to C(b)

Page 38: Distributed Synchronization

04/19/23 ICSS741 - Time and Coordination 38

Vector Clocks

• Vector clocks were designed to overcome the shortcomings of Lamport’s clocks– A vector clock is an array of times

• The rules:– Initially, Vi[j]=0, for i,j = 1,2 …, N– Just before pi timestamps an event, it increments Vi[i]– pi includes the value t = Vi in every message it sends– When pi receives a timestamp in as message, it takes

the component-wise maximum of the two vector timestamps

Page 39: Distributed Synchronization

04/19/23 ICSS741 - Time and Coordination 39

Example

a b

c d

e f

m1

m2

(2,0,0)(1,0,0)

(2,1,0) (2,2,0)

(2,2,2)(0,0,1)

p1

p2

p3

Physical time

Page 40: Distributed Synchronization

04/19/23 ICSS741 - Time and Coordination 40

Comparing Timestamps

• Vector timestamps are compared as follows– V = V’ iff V[j] = V’[j] for j=1,2,…,N– V <= V’ iff V[j] <= V’[j] for j=1,2…,N– V < V’ iff V<=V’ and V != V’

• So what?– If V(e) < V(e’) then ee’– c and e are concurrent since neither V(c) <=

V(e) nor V(e)<=V(c)