reading material

43
Undergraduate course on Real-time Systems Linköping TDDD07 Real-time Systems Lecture 4: Distributed Systems Simin Nadjm-Tehrani Real-time Systems Laboratory Department of Computer and Information Science Linköping university 43 pages Autumn 2009

Upload: imani-stevens

Post on 31-Dec-2015

22 views

Category:

Documents


1 download

DESCRIPTION

Reading material. Course book: Chapter 6.8 and 9.1 of Burns & Wellings (not language specific parts) E-books at LiU library: See web page links. This lecture. Overview of some basic notions in timing and how distributed systems are affected by them - PowerPoint PPT Presentation

TRANSCRIPT

Undergraduate course on Real-time SystemsLinköping

TDDD07 Real-time Systems

Lecture 4: Distributed Systems

Simin Nadjm-Tehrani

Real-time Systems Laboratory

Department of Computer and Information ScienceLinköping university

43 pagesAutumn 2009

Undergraduate course on Real-time SystemsLinköping

2 of 43Autumn 2009

Reading material

• Course book: Chapter 6.8 and 9.1 of Burns & Wellings

(not language specific parts)

• E-books at LiU library:– See web page links

Undergraduate course on Real-time SystemsLinköping

3 of 43Autumn 2009

This lecture

• Overview of some basic notions in timing and how distributed systems are affected by them

• Time, clock synchronisation, order of events, and logical clocks

...

Undergraduate course on Real-time SystemsLinköping

4 of 43Autumn 2009

• Banking systems• On-line access & electronic services• Peer-to-Peer networks• Distributed control

– Cars, Airplanes• Sensor and ad hoc networks

– Buildings, Environment• Grid computing

Applications

Undergraduate course on Real-time SystemsLinköping

5 of 43Autumn 2009

Common in all these?

Distributed model of computing:

• Multiple processes• Disjoint address spaces• Inter-process communication• Collective goal

Undergraduate course on Real-time SystemsLinköping

6 of 43Autumn 2009

Synchrony vs. asynchrony

• Model for distributed computations depends on – the rate at which computations are done at

each node (process)– the expected delay for transmission of

messages

• Synchronous: There is a bound on message delays, and the rates of computation at different processes can be related

• Asynchronous: No bounds on message delays and no known relation among the processing speeds at different nodes

Undergraduate course on Real-time SystemsLinköping

7 of 43Autumn 2009

The choice

• Which model is harder to use?

• What it means to be hard or easy to use?

• How do implementations of real systems relate to the various models?

Undergraduate course on Real-time SystemsLinköping

8 of 43Autumn 2009

• Synchronous:– Local clocks can be used to implement timeouts– Lack of response from another node can be

interpreted as detection of failure

• Asynchronous:– In the absence of global (synchronised) time the only

system wide abstraction of time is order of events

Implications

Undergraduate course on Real-time SystemsLinköping

9 of 43Autumn 2009

Reasons for distribution

• Locality– Engine control, brake system, gearbox

control, airbag,…• Organisation

– An extension of modularisation, and means for fault containment

• Load sharing– Web services, search, parallelisation

of heavy duty computations

Undergraduate course on Real-time SystemsLinköping

10 of 43Autumn 2009

Local control

Simplistic view:• It is all about data: each local controller

can perform its computations properly if data it needs is accessed locally

• Design modules with high cohesion and low interaction!

• But when data needs to be shared, how do we ensure that nodes have fresh data and act in concert with other nodes?

Undergraduate course on Real-time SystemsLinköping

11 of 43Autumn 2009

Organisation and containment

Simplistic view:• If module interactions are well-defined

they do not affect each other even if things go wrong

• But fault tolerance is a much harder problem in distributed systems, and timing has a big role in it

More on this in dependability lecture

Undergraduate course on Real-time SystemsLinköping

12 of 43Autumn 2009

Sharing the load

Simplistic view: • Guarantee that a node can deal with

what it accepts • Spread the load so that tasks are

(globally) serviced in a best effort manner

• But communication and cooperation overheads affect the global distributed service

Undergraduate course on Real-time SystemsLinköping

13 of 43Autumn 2009

Common issues

• Time: Sharing data may require knowledge of local time at the generating node, and comparison with the time at the consuming node

• State: Sometimes nodes need to agree on a common state/value in order to achieve a globally correct behaviour

• Faults in the system affect both

Undergraduate course on Real-time SystemsLinköping

14 of 43Autumn 2009

Major requirements

• In distributed systems:– Interoperability– Transparency– Scalability– Dependability

• This course focuses on dependability: fault tolerance and timing related issues

Undergraduate course on Real-time SystemsLinköping

15 of 43Autumn 2009

Brake-by-wire

Undergraduate course on Real-time SystemsLinköping

16 of 43Autumn 2009

Contributing to safety

• Redundancy: Having distributed sensors and actuators makes brake control more fault-tolerant

• Central decision: – what if one node gets the signal incorrectly

or late?

• Distributed decision:– what if one node is acting differently?

central decision or distributed decision?

Undergraduate course on Real-time SystemsLinköping

17 of 43Autumn 2009

• The role of time in distributed systems • Logical time vs. physical time• Clock synchronisation algorithms• Vector clocks

Time in Distributed Systems

Undergraduate course on Real-time SystemsLinköping

18 of 43Autumn 2009

Time matters…

• Inaccurate local clocks can be a problem if the result of computations at different nodes depend on time– Calculation of trajectories: if a missile was at

a given point of time before a computation where will it be after the computation?

– If the break signal is issued separately in different wheels will the car stop, and when?

Undergraduate course on Real-time SystemsLinköping

19 of 43Autumn 2009

Banking and finance

• The rate of interest is applied to funds – at a given point in time– to a balance that reflects related

transactions prior to that point

• The gain/loss on sales of stocks is dependent on dynamic values of stocks at a given time (the time of sale/purchase)

Undergraduate course on Real-time SystemsLinköping

20 of 43Autumn 2009

Local vs. global clock

• Most physical (local) clocks are not always accurate

• What is meant by accurate?– Agreement with UTC– Coordinated Universal Time (UTC) is in turn

coordinated to adjust for the variations in the rotation of earth to agree with International Atomic Time (IAT)

• Local clocks need to be synchronised regularly• An atomic global clock accurately measures

IAT• If local clocks are synchronised with an

(accurate) global clock we may be able to use a synchronous model in the application

Undergraduate course on Real-time SystemsLinköping

21 of 43Autumn 2009

Clock synchronisation

Two types of algorithms:• Internal synchronisation

– Tries to keep a set of clock values close to each other with a maximum skew of δ

• External synchronisation– Tries to keep the values of a set of

clocks agree with an accurate clock, with a skew of δ

Undergraduate course on Real-time SystemsLinköping

22 of 43Autumn 2009

Lamport/Melliar-Smith Algorithm

• Internal synchronisation of n clocks• Each clock reads the value of all other

clocks at regular intervals– If the value of some clock drifts from

the own clock by more than δ, that clock value is replaced by own clock value

– The average of all clocks is computed– Own clock value is updated to the

average value

Undergraduate course on Real-time SystemsLinköping

23 of 43Autumn 2009

Does it work?

• After each synchronisation interval the clocks get closer to each other

• If the drifts are within δ, and the clocks are initially synchronised then they are kept within δ from each other

• But what if some clocks give faulty values?

Undergraduate course on Real-time SystemsLinköping

24 of 43Autumn 2009

Faulty clocks

• If a clock drifts by more than δ its value is eliminated – does not “harm” other clocks

• What if it drifts by exactly δ? – check it as an exercise!

• What is the worst case?

Undergraduate course on Real-time SystemsLinköping

25 of 43Autumn 2009

Will be considered as correct by i and j…

c

c-2

c-

c+

i j

k

A two-face faulty clock k

Undergraduate course on Real-time SystemsLinköping

26 of 43Autumn 2009

Bound on the faulty clocks

• To guarantee that the set will keep δ we need an assumption on the number of faulty clocks

• For t faulty clocks the algorithm works if the number of clocks n >3t

Undergraduate course on Real-time SystemsLinköping

27 of 43Autumn 2009

Logical time

• Sometimes order will do• In the absence of exact synchronisation

we may use order that is intrinsic in an application

Client A

Client B

Server

ReqARepA

ReqB

Undergraduate course on Real-time SystemsLinköping

28 of 43Autumn 2009

Logical clocks

• Based on event counts at each node• May reflect causality • Sending a message always precedes

receiving it • Messages sent in a sequence by one

node are (potentially) causally related to each other– I do not pay for an item if I do not first

check the item’s availability

Undergraduate course on Real-time SystemsLinköping

29 of 43Autumn 2009

Happened before~~~~

• Assume each process has a monotonically increasing physical clock

• Rule 1: if the time for event a is before the time for event b then a b

• Rule 2: if a denotes sending a message and b denotes receiving the same message then a b

• Rule 3: is transitive

Undergraduate course on Real-time SystemsLinköping

30 of 43Autumn 2009

A partial order

• Any events that are not in the “happened before” relation are treated as concurrent

• Logical clock: An event counter that respects the “happened before” ordering

• Sometimes referred to as Lamport’s clocks (author of first paper in this topic: 1978)

Undergraduate course on Real-time SystemsLinköping

31 of 43Autumn 2009

What do we know here?

P

Q

R

a g

b c he

f d

Undergraduate course on Real-time SystemsLinköping

32 of 43Autumn 2009

Implementing a logical clock

• LC1: Each time a local event takes place increment LC by 1

• LC2: Each time a message m is sent the LC value at the sender is appended to the message (m_LC)

• LC3: Each time a message m is received set LC to max(LC, m_LC)+1

Undergraduate course on Real-time SystemsLinköping

33 of 43Autumn 2009

Exercise

• Calculate LC for all events in the given example

Undergraduate course on Real-time SystemsLinköping

34 of 43Autumn 2009

What does LC tell us?

• a b → LC(a) < LC(b)

• Note that:

LC(d) < LC(h) does not imply d h

Undergraduate course on Real-time SystemsLinköping

35 of 43Autumn 2009

Is concurrency transitive?

• e is concurrent with g• g is concurrent with f

• but e is not concurrent with f!

• Vector clocks bring more...

Undergraduate course on Real-time SystemsLinköping

36 of 43Autumn 2009

Vector clocks (VC)

• Every node maintains a vector of counted events (one entry for each other node)

• VC for event e, VC(e) = [1,…,n], shows the perceived count of events at nodes 1,…,n

• VC(e)[k] denotes the entry for node k

Undergraduate course on Real-time SystemsLinköping

37 of 43Autumn 2009

Example revisited

P

Q

R

a g

b c he

f d

Undergraduate course on Real-time SystemsLinköping

38 of 43Autumn 2009

Implementation of VC

• Rule 1: For each local event increment own entry

• Rule 2: When sending message m, append to m the VC(send(m)) as a timestamp T

• Rule 3: When receiving a message at node i, – increment own entry: VC[i]:= VC[i]+1– For every entry j in the VC: Set the entry to

max (T[j], VC[j])

Undergraduate course on Real-time SystemsLinköping

39 of 43Autumn 2009

Example

[0,1,0]

[1,1,0] [2,1,0]

[0,0,0]

[0,0,0]

[0,0,0]

[0,0,1] [0,0,2]

[2,1,3] [2,1,4]

[2,2,4]

Undergraduate course on Real-time SystemsLinköping

40 of 43Autumn 2009

Concurrent events in VC

• Relation < on vector clocks defined by:VC(a) < VC(b) iff

– For all i: VC(a)[i] ≤VC(b)[i]– For some i: VC(a)[i] < VC(b)[i]

• An event a precedes another event b if VC(a) < VC(b)

• If neither VC(a) < VC(b) nor VC(b) < VC(a) then a and b are concurrent

Undergraduate course on Real-time SystemsLinköping

41 of 43Autumn 2009

Pros and cons

• Vector clocks are a simple means of capturing “known” precedence

• VC(a) < VC(b) → a b

• For large systems we have resource issues (bandwidth wasted), and maintainability issues

Undergraduate course on Real-time SystemsLinköping

42 of 43Autumn 2009

• Vector clocks help to synchronise at event level– Consistent snapshots

• But reasoning about response times and fault tolerance needs quantitative bounds

Undergraduate course on Real-time SystemsLinköping

43 of 43Autumn 2009

Distribution & Fault tolerance

– Distribution introduces new complications• no global clock• richer failure models

+Replication and group mechanisms • transparency in treatment of faults

We will come back to faults in lecture 6, and see that synchronisation is needed for tolerating some faults