advanced distributed algorithms and data structures

60
Advanced Distributed Algorithms and Data Structures Christian Scheideler Institut für Informatik Universität Paderborn

Upload: others

Post on 01-Mar-2022

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Advanced Distributed Algorithms and Data Structures

Advanced Distributed Algorithms and Data Structures

Christian ScheidelerInstitut für Informatik

Universität Paderborn

Page 2: Advanced Distributed Algorithms and Data Structures

Advanced Distributed Algorithms and Data Structures

Lecture: Thu 2-4 pm, F2.211Tutorial: Thu 10-11 am, F2.211 (starts 3rd week)

Website: http://cs.uni-paderborn.de/ti/lehre/veranstaltungen/ws-20162017/advanced-distributed-algorithms-and-data-structures/

Modules and Grading: • MuA Modules III.2.1, III.2.2 and III.2.4• 50%: oral exam, 50%: software project

both parts must be passed to pass the course

Prerequisites: • basic knowledge in algorithms and data structures• recommended: distributed algorithms and data structures course

WS 2016 2Chapter 1

Page 3: Advanced Distributed Algorithms and Data Structures

Advanced Distributed Algorithms and Data Structures

Homework assignments:• Weekly assignments each Thursday on

the website (starting with next week)• Theoretical and implementations

Slides and assignments: course websiteBook recommendations: no book available

(lecture is based on newest results)WS 2016 3Chapter 1

Page 4: Advanced Distributed Algorithms and Data Structures

Embedding into CS Curriculum

Data Structuresand Algorithms

Distributed Algorithmsand Data Structures

Advanced Distributed Algorithmsand Data Structures

Bachelor I

Bachelor II

Master

WS 2016 4Chapter 1

Page 5: Advanced Distributed Algorithms and Data Structures

Embedding into CS Curriculum

Advanced Distributed Algorithmsand Data Structures

Master

WS 2016 5Chapter 1

Project Group: Design of a Trusted Communication Module

Page 6: Advanced Distributed Algorithms and Data Structures

Advanced Distributed Algorithms and Data Structures

Goals:1. Introduction to advanced concepts in

distributed algorithms and data structures.

2. Introduction to important design methods.

3. Introduction to important analytical methods.

WS 2016 6Chapter 1

Page 7: Advanced Distributed Algorithms and Data Structures

Introduction

Sequential Algorithmsand Data Structures

Distributed Algorithmsand Data Structures

WS 2016 7Chapter 1

Page 8: Advanced Distributed Algorithms and Data Structures

Introduction

What are the basic problems fordistributed algorithms and data structures?

WS 2016 8Chapter 1

Page 9: Advanced Distributed Algorithms and Data Structures

IntroductionDefinition 1.1: A data structure is a certain way to

organize data in a computer so that operations like, forexample, search, insert, and delete are simple andeffective to realize.

Simple examples:• Lists

• Arrays

WS 2016 9Chapter 1

Page 10: Advanced Distributed Algorithms and Data Structures

IntroductionBasic view:

Data Structure

Operation 1

Operation 2

Operation 3

WS 2016 10Chapter 1

Page 11: Advanced Distributed Algorithms and Data Structures

Introduction

Classical case: computer with one processor

Operations

Storage

Data structure

WS 2016 11Chapter 1

Page 12: Advanced Distributed Algorithms and Data Structures

Introduction

Computer with several processors/cores:

Storage

Data structure

WS 2016 12Chapter 1

Page 13: Advanced Distributed Algorithms and Data Structures

Computer with several processors/cores:

Data structure

Introduction

Cache Cache Cache Cache

Overlaps:• access conflicts (correctness)• performance problems (efficiency)

WS 2016 13Chapter 1

Page 14: Advanced Distributed Algorithms and Data Structures

IntroductionMultiple computers:

Problem: distribution of DS among computers

Data structure

WS 2016 14Chapter 1

Page 15: Advanced Distributed Algorithms and Data Structures

IntroductionMultiple computers:

Problem: distribution of DS among computers

Da ta str uct ure

WS 2016 15Chapter 1

Page 16: Advanced Distributed Algorithms and Data Structures

IntroductionMultiple computers:

Basic problems: • How to interconnect the computers?• How to coordinate the management of the DS

among the computers?

Da ta str uct ure

WS 2016 16Chapter 1

Page 17: Advanced Distributed Algorithms and Data Structures

IntroductionHow to represent the connections between the computers?

A knows (IP address, MAC address,… of) resp. has accessautorization for B : network can send message from A to B

High-level view: A knows B ⇒ overlay edge (A,B) from A to B ( A B )

Set of all overlay edges forms directed graph known as overlaynetwork.

A BCommunication network

(Internet, ad-hoc network,…)

WS 2016 Chapter 1 17

Page 18: Advanced Distributed Algorithms and Data Structures

Introduction

Problem: find suitable overlay network for thecomputers / processes

Best topology depends on problem and context.WS 2016 18Chapter 1

A

B

A and B know each other

Page 19: Advanced Distributed Algorithms and Data Structures

Introduction

WS 2016 Chapter 1 19

Distributed Algorithmsand Data Structures

Advanced Distributed Algorithmsand Data Structures

Extensive study ofoverlay networks

Assumption:processes form a clique

Still many problems left: link management, accesscontrol, synchronization, communication primitives,transactions, and various applications.

Page 20: Advanced Distributed Algorithms and Data Structures

Introduction

WS 2016 Chapter 1 20

Distributed Algorithmsand Data Structures

Advanced Distributed Algorithmsand Data Structures

Extensive study ofoverlay networks

Assumption:processes form a clique

Focus: strategies that are scalable, robust andsecure.

Page 21: Advanced Distributed Algorithms and Data Structures

Introduction

WS 2016 Chapter 1 21

Focus: strategies that are scalable, robust andsecure (because participants might be faulty oradversarial, or might get attacked from outside!)

Da ta str uct ure

Page 22: Advanced Distributed Algorithms and Data Structures

Introduction

WS 2016 Chapter 1 22

Central properties a system should satisfy:• Robustness: availability• Security: integrity and confidentiality

Da ta str uct ure

Page 23: Advanced Distributed Algorithms and Data Structures

Introduction

Four Commandments:1. You shall not sleep2. You shall not fail3. You shall not lie4. You shall not leak

!

Page 24: Advanced Distributed Algorithms and Data Structures

Introduction

Four Commandments:1. You shall not sleep2. You shall not fail3. You shall not lie4. You shall not leak

Measures against violations:Quite challenging…

availability

integrityconfidentiality

Page 25: Advanced Distributed Algorithms and Data Structures

IntroductionExamples of violations:1. You shall not sleep

→ asynchronicity2. You shall not fail

→ message loss, process/link failures,Denial-of-Service attacks

3. You shall not lie→ corrupted information/messages/protocols

4. You shall not leak→ outsider (e.g., man-in-the-middle) or insider(e.g., through viruses) attacks

Page 26: Advanced Distributed Algorithms and Data Structures

IntroductionMeasures to be robust to asynchronicity:

Decoupling of time and flow

• Time decoupling: interacting processes do not need to be available at the same time (usually needs mediator)

• Flow decoupling: the execution of an action within a process should not depend on the availability of another process (no remote calls asking for immediate return values)

26VADS - Kapitel 3SS 2016

Page 27: Advanced Distributed Algorithms and Data Structures

IntroductionMeasures against message loss and failures:• Reactive approaches: recovery from any

illegal state (such systems known as self-healing / self-stabilizing)

• Proactive approaches: maintain availabilityeven in faulty states (requires redundancy)

Measures against Denial-of-Service attacks:• Standard approach: enforce confidentiality to

make targeted DoS attacks hard

WS 2016 Chapter 1 27

Page 28: Advanced Distributed Algorithms and Data Structures

IntroductionMeasures against corrupted messages andinformation:• Algorithmic approach: use redundancy• Cryptographic approach: integrity measures

(encode messages and information so thatcorrectness can be checked)

Measures against corrupted protocols:• Hard and expensive (→ Byzantine general

models)

WS 2016 Chapter 1 28

Page 29: Advanced Distributed Algorithms and Data Structures

Introduction

Measures against leaking:• Algorithmic approach: secret sharing• Cryptographic approach: use encryption

In the real world, leaking is hard to avoid.Main problem: exposure!

WS 2016 Chapter 1 29

Page 30: Advanced Distributed Algorithms and Data Structures

Introduction• SPAM:

Anyone can send you an email anddisseminate your email address.

• DoS attacks:Anyone who has your IP address can send you a message and freely disseminate your IP address.

• Viruses:Once a virus is in your system, ithas access to all information in it.

Page 31: Advanced Distributed Algorithms and Data Structures

IntroductionOwner consent and control:• Clearly defined responsibilities• Complete control over own info & resources

Least exposure:• Not more knowledge than necessary• Complete control over information flow

Self-recovery:• Recovery from every possible state (as long

as underlying layer is still in a legal state)

Decoupling: • No synchronization necessary for primitives

Bachelorcourse:overlays

Page 32: Advanced Distributed Algorithms and Data Structures

IntroductionOwner consent and control:• Clearly defined responsibilities• Complete control over own info & resources

Least exposure:• Not more knowledge than necessary• Complete control over information flow

Problem: current systems cannot enforce these rules(because they are too open, too complex)

Page 33: Advanced Distributed Algorithms and Data Structures

IntroductionOwner consent and control:• Clearly defined responsibilities• Complete control over own info & resources

Least exposure:• Not more knowledge than necessary• Complete control over information flow

Not even suitable models and primitives available to rigorously study guidelines in the algorithms community.So viruses, trojans, and DoS attacks are usually ignored.

In this lecture: new approach

Page 34: Advanced Distributed Algorithms and Data Structures

IntroductionNew approach: Trusted Communication Model (TCM)

• AL (Application Layer): large storage capacity and computationalpower, but potentially insecure

• TCL (Trusted Communication Layer): low storage capacity andcomputational power but can securely manage ports and keys andcan securely execute basic primitives

WS 2016 Chapter 1 34

TCL InternetAL

Page 35: Advanced Distributed Algorithms and Data Structures

IntroductionNew approach: Trusted Communication Model (TCM)

• AL: can be invaded• TCL: cannot be invaded or inspectedGoal of TCL: support AL in ensuring availability, integrity, andconfidentiality

WS 2016 Chapter 1 35

TCL InternetAL

Page 36: Advanced Distributed Algorithms and Data Structures

Introduction

WS 2016 Chapter 1 36

Application Layer (HTTP)

Presentation Layer (SMTP)

Session Layer (NCP)

Transport Layer (TCP/UDP)

Network Layer (IP)

Data Link Layer (Ethernet)

Physical Layer (MAC)

ISO OSI-Model:

AL

TCL

Page 37: Advanced Distributed Algorithms and Data Structures

Advanced distributed algorithms and data structures

Contents:1. Introduction2. Graph theory3. Probability theory4. Link primitives5. TCM model and programming environment6. Congestion control7. Clock synchronization8. Logical Clocks9. Dynamic Overlay Networks10. Broadcasting and Anycasting11. Distributed Commit12. ApplicationsWS 2016 37Chapter 1

Page 38: Advanced Distributed Algorithms and Data Structures

Graph Theory

• Introduction to fundamental topologies

• Basic graph parameters(degree, diameter, expansion,…)

WS 2016 38Chapter 1

Page 39: Advanced Distributed Algorithms and Data Structures

Advanced distributed algorithms and data structures

Contents:1. Introduction2. Graph theory3. Probability theory4. Link primitives5. TCM model and programming environment6. Congestion control7. Clock synchronization8. Logical Clocks9. Dynamic Overlay Networks10. Broadcasting and Anycasting11. Distributed Commit12. ApplicationsWS 2016 39Chapter 1

Page 40: Advanced Distributed Algorithms and Data Structures

Probability Theory

• Random variable• Expectation• Variance• Markov inequality• Chernoff bounds

WS 2016 Chapter 1 40

Page 41: Advanced Distributed Algorithms and Data Structures

Advanced distributed algorithms and data structures

Contents:1. Introduction2. Graph theory3. Probability theory4. Link primitives5. TCM model and programming environment6. Congestion control7. Clock synchronization8. Logical Clocks9. Dynamic Overlay Networks10. Broadcasting and Anycasting11. Distributed Commit12. ApplicationsWS 2016 41Chapter 1

Page 42: Advanced Distributed Algorithms and Data Structures

Link PrimitivesSafe forms of

WS 2016 Chapter 1 42

uv

wu

v

w

uv

wu

v

w

u u vv

u u vv

Introduction Fusion

Delegation Reversal

Page 43: Advanced Distributed Algorithms and Data Structures

Advanced distributed algorithms and data structures

Contents:1. Introduction2. Graph theory3. Probability theory4. Link primitives5. TCM model and programming environment6. Congestion control7. Clock synchronization8. Logical Clocks9. Dynamic Overlay Networks10. Broadcasting and Anycasting11. Distributed Commit12. ApplicationsWS 2016 43Chapter 1

Page 44: Advanced Distributed Algorithms and Data Structures

TCM Model andProgramming Environment

Processes can interconnect (using link primitives) and execute actions.

Types of actions:• Triggered by a local/remote call:

⟨name⟩(⟨parameters⟩) → ⟨commands⟩• Triggered by a local state:

⟨name⟩: ⟨predicate⟩ → ⟨commands⟩All messages are remote action calls.

Example:minimum(x,y) →

if x<y then m:=x else m:=yprint(m) no return command!

Action „minimum“ is executed whenever a request to callminimum(x,y) is processed.

WS 2016 Chapter 1 44

Page 45: Advanced Distributed Algorithms and Data Structures

TCM Model andProgramming Environment

Processes can interconnect (using link primitives) and execute actions.

Types of actions:• Triggered by a local/remote call:

⟨name⟩(⟨parameters⟩) → ⟨commands⟩• Triggered by a local state:

⟨name⟩: ⟨predicate⟩ → ⟨commands⟩All messages are remote action calls.

Example:timeout: true →

print(„I am still alive!“)

„true“ ensures that action timeout is periodically executed by thegiven peer (length of period handled by access control).

WS 2016 Chapter 1 45

Page 46: Advanced Distributed Algorithms and Data Structures

Advanced distributed algorithms and data structures

Contents:1. Introduction2. Graph theory3. Probability theory4. Link primitives5. TCM model and programming environment6. Congestion control7. Clock synchronization8. Logical Clocks9. Dynamic Overlay Networks10. Broadcasting and Anycasting11. Distributed Commit12. ApplicationsWS 2016 46Chapter 1

Page 47: Advanced Distributed Algorithms and Data Structures

Congestion Control

Problem: nodes need to periodically contacttheir neighbors (synchronization, failuredetection,…) without overwhelming them

WS 2016 Chapter 1 47

Page 48: Advanced Distributed Algorithms and Data Structures

Advanced distributed algorithms and data structures

Contents:1. Introduction2. Graph theory3. Probability theory4. Link primitives5. TCM model and programming environment6. Congestion control7. Clock synchronization8. Logical Clocks9. Dynamic Overlay Networks10. Broadcasting and Anycasting11. Distributed Commit12. ApplicationsWS 2016 48Chapter 1

Page 49: Advanced Distributed Algorithms and Data Structures

Clock Synchronization

The physical clocks of the processes shouldshow the same time.

WS 2016 Chapter 1 49

Page 50: Advanced Distributed Algorithms and Data Structures

Advanced distributed algorithms and data structures

Contents:1. Introduction2. Graph theory3. Probability theory4. Link primitives5. TCM model and programming environment6. Congestion control7. Clock synchronization8. Logical Clocks9. Dynamic Overlay Networks10. Broadcasting and Anycasting11. Distributed Commit12. ApplicationsWS 2016 50Chapter 1

Page 51: Advanced Distributed Algorithms and Data Structures

Logical ClocksEach request obtains a logical time stamp when processedso that requests can be topologically sorted.

→ Important for transactions, snapshots, …

WS 2016 Chapter 1 51

Process 1

Process 2

Process 3

Process 4

Logical time

: requestprocessing

1

2

3 4

5

6

7

8

9

10 11

12

13

14

15

Page 52: Advanced Distributed Algorithms and Data Structures

Advanced distributed algorithms and data structures

Contents:1. Introduction2. Graph theory3. Probability theory4. Link primitives5. TCM model and programming environment6. Congestion control7. Clock synchronization8. Logical Clocks9. Dynamic Overlay Networks10. Broadcasting and Anycasting11. Distributed Commit12. ApplicationsWS 2016 52Chapter 1

Page 53: Advanced Distributed Algorithms and Data Structures

Dynamic Overlay Networks• Self-stabilizing clique and 2-clique (diameter 2)

• Other topologies: see Bachelor lecture

WS 2016 Chapter 1 53

Page 54: Advanced Distributed Algorithms and Data Structures

Advanced distributed algorithms and data structures

Contents:1. Introduction2. Graph theory3. Probability theory4. Link primitives5. TCM model and programming environment6. Congestion control7. Clock synchronization8. Logical Clocks9. Dynamic Overlay Networks10. Broadcasting and Anycasting11. Distributed Commit12. ApplicationsWS 2016 54Chapter 1

Page 55: Advanced Distributed Algorithms and Data Structures

Broadcasting and AnycastingBroadcasting: send a message to all nodes

Anycasting: send a message to a random nodeWS 2016 Chapter 1 55

2

2 2

2

2 2

2

Page 56: Advanced Distributed Algorithms and Data Structures

Advanced distributed algorithms and data structures

Contents:1. Introduction2. Graph theory3. Probability theory4. Link primitives5. TCM model and programming environment6. Congestion control7. Clock synchronization8. Logical Clocks9. Dynamic Overlay Networks10. Broadcasting and Anycasting11. Distributed Commit12. ApplicationsWS 2016 56Chapter 1

Page 57: Advanced Distributed Algorithms and Data Structures

Distributed Commit

WS 2016 Chapter 1 57

commit 2

commit 4

2 2

2

2 2

2

Page 58: Advanced Distributed Algorithms and Data Structures

Advanced distributed algorithms and data structures

Contents:1. Introduction2. Graph theory3. Probability theory4. Link primitives5. TCM model and programming environment6. Congestion control7. Clock synchronization8. Logical Clocks9. Dynamic Overlay Networks10. Broadcasting and Anycasting11. Distributed Commit12. ApplicationsWS 2016 58Chapter 1

Page 59: Advanced Distributed Algorithms and Data Structures

Applications

• Authenticated information system• Crypto currencies• Publish / subscribe systems

(may still change)

WS 2016 Chapter 1 59

Page 60: Advanced Distributed Algorithms and Data Structures

Questions?

WS 2016 60Chapter 1