lecture 2 introduction to principles of distributed computing

50
Sergio Rajsbaum 2006 Lecture 2 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico

Upload: tessa

Post on 13-Jan-2016

24 views

Category:

Documents


1 download

DESCRIPTION

Lecture 2 Introduction to Principles of Distributed Computing. Sergio Rajsbaum Math Institute UNAM, Mexico. Lecture 2. Part I : Refresh from Lecture I. What is a distributed system and its parameters. Problems solved in such a system. The need for a theoretical foundation. Two-phase commit - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Lecture 2 Introduction to Principles of Distributed Computing

Sergio Rajsbaum 2006

Lecture 2Introduction to Principles of

Distributed Computing

Sergio RajsbaumMath Institute

UNAM, Mexico

Page 2: Lecture 2 Introduction to Principles of Distributed Computing

Sergio Rajsbaum 2006

Lecture 2

• Part I: Refresh from Lecture I. What is a distributed system and its parameters. Problems solved in such a system. The need for a theoretical foundation. Two-phase commit

• Part II: Coordinated attack, consensus

Page 3: Lecture 2 Introduction to Principles of Distributed Computing

Sergio Rajsbaum 2006

Part I: What is a distributed system

The need for a theoretical foundation. Two-phase commit

Page 4: Lecture 2 Introduction to Principles of Distributed Computing

Sergio Rajsbaum 2006

Principles of Distributed Computing

• Distributed computing studies systems where components interact and collaborate

• Principles of distributed computing tries to understand the fundamental possibilities and limitations of such systems, with a precise, scientific approach

• Goal: to design efficient and reliable systems, and techniques to design them, analyze them and prove them correct, or to prove impossibility results when no protocol exists

Page 5: Lecture 2 Introduction to Principles of Distributed Computing

Sergio Rajsbaum 2006

What is distributed computing?

• Any system where several independent computing components interact

• This broad definition encompasses– VLSI chips, and any modern PC

– tightly-coupled shared memory multiprocessor

– local area cluster of workstations

– internet, WEB, Web services

– wireless networks, sensor networks, ad-hoc networks

– cooperating robots, mobile agents, P2P systems

Page 6: Lecture 2 Introduction to Principles of Distributed Computing

Sergio Rajsbaum 2006

Computing components

• Referred to processors or processes in the literature

• Can represent a– microprocessor – process in a multiprocessing operating system– Java thread– mobile agent, mobile node (e.g. laptop), robot– computing element in a VLSI chip

Page 7: Lecture 2 Introduction to Principles of Distributed Computing

Sergio Rajsbaum 2006

Interaction – message passing vs. shared memory

• Processors need to communicate with each other to collaborate, via

• Message passing– Point-to-point channels, defining an interconnection

graph– All-to-all using an underlying infrastructure (e.g.

TCP/IP)– Broadcast; wireless, satellite

• Shared memory– Shared-objects: read/write, test&set, compare&swap, etc– Usually harder to implement, easier to program

Page 8: Lecture 2 Introduction to Principles of Distributed Computing

Sergio Rajsbaum 2006

A distributed system

processors

Communicationmedia

collaborate

Page 9: Lecture 2 Introduction to Principles of Distributed Computing

Sergio Rajsbaum 2006

Failures

• Any system that includes many components running over a long period of time must consider the possibility of failures

• of processors and communication media

• of different severity– from processor crashes or message loses, to– malicious Byzantine behavior

Page 10: Lecture 2 Introduction to Principles of Distributed Computing

Sergio Rajsbaum 2006

Many kinds of problems

• Clock synchronization• Routing• Broadcasting• Naming• P2P, how to share and find resources• sharing resources, mutual exclusion• Increasing fault-tolerance, failure detection• Security, authentication, cryptography• Database transactions, atomic commitment• Backups, reliable storage, file systems• Applications, airline reservation, banking, electronic

commerce, publish/subscribe systems, web search, web caching, …

Page 11: Lecture 2 Introduction to Principles of Distributed Computing

Sergio Rajsbaum 2006

Multi-layered, complex interactionsAn example

• A fault-tolerant broadcast service is useful to build a higher level database transaction module

• Naming, authentication is required• And may work more efficiently if clocks are tightly

synchronized• And good routing schemes should exist• If the clock synchronization is attacked, the whole

system may be compromised

Page 12: Lecture 2 Introduction to Principles of Distributed Computing

Sergio Rajsbaum 2006

Chaos

We need a good foundation,

principles of distributed computing

Page 13: Lecture 2 Introduction to Principles of Distributed Computing

Sergio Rajsbaum 2006

Chaos

• Too many models, problems and orthogonal, interacting issues

• Very hard to get things right, to reproduce operating scenarios

• Sometimes it is easy to adapt a solution to a different model, sometimes a small change in the model makes a problem unsolvable

Page 14: Lecture 2 Introduction to Principles of Distributed Computing

Sergio Rajsbaum 2006

Distributed computing theory• Models

– Good models [Schneider Ch.2 in Distributed Systems, Mullender (Ed.)]

– Relation between models: solve a problem only once; solve it in the strongest possible model

• Problems– Search of paradigms that represent fundamental distributed

computing issues– Relations between problems: hierarchies of solvable and unsolvable

problems; reductions• Solutions

– Design algorithms, verification techniques, programming abstractions

– Impossibility results and lower bounds• Efficiency measures

– Time, communication, failures, recovery time, bottlenecks, congestion

Page 15: Lecture 2 Introduction to Principles of Distributed Computing

Sergio Rajsbaum 2006

Distributed Commit

An example of a distributed protocol

Fundamental part of distributed DBMS

Page 16: Lecture 2 Introduction to Principles of Distributed Computing

Sergio Rajsbaum 2006

Distributed Commit

• A distributed transaction with components at several sites should execute atomically

• Example: A manager of a chain of stores wants to query all the stores, find the inventory of toothbrushes at each, and issue instructions to move toothbrushes from store to store in order to balance the inventory.

• The operation is done by a single global transaction T that has component Ti at the i-th store and a component T0 at the office where the manages is located.

Page 17: Lecture 2 Introduction to Principles of Distributed Computing

Sergio Rajsbaum 2006

Sequence of activities performed by T

1. Component T0 is created at the site of the manager2. T0 sends messages to all the stores instructing them to

create components Ti3. Each Ti executes a query at store I to discover the number

of toothbrushes in inventory and reports this number to T04. T0 takes these numbers and determines, by some algorithm

we shall not discuss, what shipments of toothbrushes are desired. T0 then sends messages such as “store 10 should ship 500 toothbrushes to store 7” to the appropriate stores

5. Stores receiving instructions update their inventory and perform the shipments

Page 18: Lecture 2 Introduction to Principles of Distributed Computing

Sergio Rajsbaum 2006

Atomicity

• Make sure it does not happen: some of the actions of T get executed, but others do not

• We do assume atomicity of each Ti, through mechanisms such as logging and recovery

• Failures make difficult the achievement of atomicity of T– A site fails or is disconnected from the network

– A bug in the algorithm to redistribute toothbrushes instructs store 10 to ship more than it has

Page 19: Lecture 2 Introduction to Principles of Distributed Computing

Sergio Rajsbaum 2006

Example of failures

• Suppose T10 replies to T0’s 1st message with its inventory.

• The machine at 10 then crashes, the instructions form T0 are never received by T10

• However, T7 sees no problem, and receives the instructions from T0

• Can distributed transaction T ever commit?

Page 20: Lecture 2 Introduction to Principles of Distributed Computing

Sergio Rajsbaum 2006

Agreement Paradigms

Coordinated attack

Consensus

Page 21: Lecture 2 Introduction to Principles of Distributed Computing

Sergio Rajsbaum 2006

Coordinated AttackAn important abstraction

• a pair of allied generals A and B have agreed to attack simultaneously or not at all.

• they can only communicate via carrier pigeon; message loss is possible

A B

Page 22: Lecture 2 Introduction to Principles of Distributed Computing

Sergio Rajsbaum 2006

Difficulty: uncertainty

• Suppose general A sends the message to B “attack at dawn”

• general A won’t attack alone. A doesn’t know whether B has received the message. B understand A’s predicament, so B sends an acknowledgment “agreed”

Page 23: Lecture 2 Introduction to Principles of Distributed Computing

Sergio Rajsbaum 2006

Impossible

Theorem: Assume that communication is unreliable. Any protocol that guarantees that if one of the generals attacks, then the other does so at the same time, is a protocol in which necessarily neither general attacks.

A B

“attack at dawn”

Did B get it?

BA

“ack”

Did A get it?

Page 24: Lecture 2 Introduction to Principles of Distributed Computing

Sergio Rajsbaum 2006

It never ends

• There is always uncertainty of weather the last message was delivered or not

• Corollary: If decision must be made within a fixed time period, then unreliable communication prevents database commitment protocols

A B

“ack your ack”

Did B get it?

BA

“ack your ack to my ack”

Did A get it?

Page 25: Lecture 2 Introduction to Principles of Distributed Computing

Sergio Rajsbaum 2006

Agreement Problems in Distributed Computing are common

Because processes have different views of its state and history

Page 26: Lecture 2 Introduction to Principles of Distributed Computing

Sergio Rajsbaum 2006

Agreement Problems in Distributed Computing are common…

Because processes have different views of its state and history, due to:

• Delays• Failures

NASA plunged the Galileo spacecraft into Jupiter’s turbulent atmosphere today. The unmanned spacecraft dived into the atmosphere at 2:57 p.m. Eastern time. The last of Galileo’s data arrived on Earth today after the spacecraft was destroyed, taking 52 minutes to cross half a billion miles of space

The New York Times, 21 Sept. 2003

Page 27: Lecture 2 Introduction to Principles of Distributed Computing

Sergio Rajsbaum 2006

… and Agreement Problems are Important

• In a replicated data system: to execute the same sequence of operations on the replicated data

• In a replicated sensor system: to agree on the values of the sensors

• In a timed system: to synchronize a set of clocks• In a broadcast system: to deliver the same messages

in the same order• In a database system: to commit or abort a

transactionEtc….

Page 28: Lecture 2 Introduction to Principles of Distributed Computing

Sergio Rajsbaum 2006

Consensus

The king of agreement problems

Page 29: Lecture 2 Introduction to Principles of Distributed Computing

Sergio Rajsbaum 2006

CONSENSUS A fundamental Abstraction

Each process has an input, should decide an output s.t.

Agreement: correct processes’ decisions are the same

Validity: decision is input of one process

Termination: eventually all correct processes decide

There are at least two possible input values 0 and 1

Page 30: Lecture 2 Introduction to Principles of Distributed Computing

Sergio Rajsbaum 2006

A Solution to Consensus For a group of people sitting in a room

Page 31: Lecture 2 Introduction to Principles of Distributed Computing

Sergio Rajsbaum 2006

A Solution to ConsensusEach one raises a card with its input

2

00

1

0

Page 32: Lecture 2 Introduction to Principles of Distributed Computing

Sergio Rajsbaum 2006

A Solution to Consensus Follow a coordinator

2

00

1

0 1

1

11

1

Page 33: Lecture 2 Introduction to Principles of Distributed Computing

Sergio Rajsbaum 2006

A Solution to Consensus Majority wins (breaking ties with the largest)

2

00

1

0 0

0

00

0

Page 34: Lecture 2 Introduction to Principles of Distributed Computing

Sergio Rajsbaum 2006

A Solution to ConsensusFailures are no problem (choose another

coordinator, or majority of non-failed)

2

0%!#

1

0

Page 35: Lecture 2 Introduction to Principles of Distributed Computing

Sergio Rajsbaum 2006

A Solution to Consensus… because this cannot happen!!

2

0

%!#

1

0

1

Page 36: Lecture 2 Introduction to Principles of Distributed Computing

Sergio Rajsbaum 2006

Consensus in Distributed SystemsThis can happen: delays

1

?

?

?

Page 37: Lecture 2 Introduction to Principles of Distributed Computing

Sergio Rajsbaum 2006

Consensus in Distributed Systems and then there are different views

2

0

1

01020

1

1020?

1020?

1020?

Page 38: Lecture 2 Introduction to Principles of Distributed Computing

Sergio Rajsbaum 2006

Consensus in Distributed Systems so we try to reconcile views- another round

2

0

1

01020

1

1020?

1020?

1020?

10201

Page 39: Lecture 2 Introduction to Principles of Distributed Computing

Sergio Rajsbaum 2006

Consensus in Distributed Systems but we could have the same problem!!

2

0

1

01020

1

1020?

1020?

1020?

10201

10201

Page 40: Lecture 2 Introduction to Principles of Distributed Computing

Sergio Rajsbaum 2006

So, is consensus solvable?If so, how long does it take to solve it?

• It depends on what exactly the model is• But what is a realistic model?• And what are the common scenarios within the

model? The nature of a distributed system is to include complex combinations of failures and delays

Page 41: Lecture 2 Introduction to Principles of Distributed Computing

Sergio Rajsbaum 2006

Basic Model – asynchronous crash failure model

• Message passing (another option would be a shared memory model)

• Channels between every pair of processes

• Crash failures, with a bound tt < n potential failures out of n >1 processes

• No message loss among correct processes

• Unbounded message delays, unpredictable processor’s speeds

Page 42: Lecture 2 Introduction to Principles of Distributed Computing

Sergio Rajsbaum 2006

Distributed algorithms(protocols)

• A set of algorithms, each one runs on a different processor (or as a thread in the same computer)

• The code includes instructions to communicate with other processors: – Send (M) to p– Upon receiving a message form q do

Page 43: Lecture 2 Introduction to Principles of Distributed Computing

Sergio Rajsbaum 2006

A consensus protocol1. val input2. send val to all3. wait until at least n - t messages have been

received4. let V[j] be the val received from process j else ‘-’ 5. return h (V) = largest value in V

- This same code is executed by every process - each one receives the value input from some

application- h is a predefined function, that all processors know

Page 44: Lecture 2 Introduction to Principles of Distributed Computing

Sergio Rajsbaum 2006

Is this protocol correct ?

• It depends on what is the set C of possible inputs

• An input to the protocol is a vector I, where I[j] contains the local input of the j-th process

• The local input of pj is known only to pj

• And is taken from some universe of possible values V not including ‘-’

• Let C be the set of possible input vectors to the protocol

Page 45: Lecture 2 Introduction to Principles of Distributed Computing

Sergio Rajsbaum 2006

Exercise 11. Define a set C as large as possible for which the

protocol is correct2. Prove that the protocol is correct for this C3. Do you need to assume t < n / 2 ?

Namely, that for every I in C, in every execution with input I where at most t processes crash, the consensus requirements are satisfied

Termination: eventually all correct processes decideAgreement: correct processes’ decisions are the sameValidity: decision is input of one process

Page 46: Lecture 2 Introduction to Principles of Distributed Computing

Sergio Rajsbaum 2006

Exercise 2

The protocol uses h (V) = largest value in V

1. Define another such function h’

2. Repeat the previous exercise with respect to your h’

Page 47: Lecture 2 Introduction to Principles of Distributed Computing

Sergio Rajsbaum 2006

Exercise 3

Consider the set C that includes every possible input vector formed with values from V, where | V | is at least 2

1. Is there a function h for which the protocol is correct ?

If so, give one such h and prove the protocol is correct, otherwise, give a brief intuitive argument of why there is no such h

Page 48: Lecture 2 Introduction to Principles of Distributed Computing

Sergio Rajsbaum 2006

BibliographyTheory of distributed computing textbooks

• Attiya, Welch, Distributed Computing, Wiley-Interscience, 2 ed., 2004

• Garg, Elements of Distributed Computing, Wiley-IEEE, 2002

• Lynch, Distributed Algorithms, Morgan Kaufmann,1997

• Tel, Introduction to Distributed Algorithms, Cambridge U., 2 ed. 2001

Page 49: Lecture 2 Introduction to Principles of Distributed Computing

Sergio Rajsbaum 2006

Bibliographyothers

• Distributed Algorithms and Systems http://www.md.chalmers.se/~tsigas/DISAS/index.html

• Conferences: DISC, PODC,…

• Journals: Distributed Computing,…– Special issue PODC 20th anniversary, Sept. 2003

• ACM SIGACT News Distributed Computing Column. Also one in EATCS Bulletin

Page 50: Lecture 2 Introduction to Principles of Distributed Computing

Sergio Rajsbaum 2006