distributed systems principles and paradigmsiwanicki/courses/ds/2014/...2014/06/07 · distributed...

Distributed SystemsPrinciples and Paradigms

Maarten van Steen

VU Amsterdam, Dept. Computer ScienceRoom R4.20, [email protected]

Chapter 05: Naming

Version: November 11, 2010

Contents

Chapter01: Introduction02: Architectures03: Processes04: Communication05: Naming06: Synchronization07: Consistency & Replication08: Fault Tolerance09: Security10: Distributed Object-Based Systems11: Distributed File Systems12: Distributed Web-Based Systems13: Distributed Coordination-Based Systems

2 / 38

Naming 5.1 Naming Entities

Naming Entities

Names, identifiers, and addressesName resolutionName space implementation

3 / 38

Naming

EssenceNames are used to denote entities in a distributed system. To operateon an entity, we need to access it at an access point. Access pointsare entities that are named by means of an address.

NoteA location-independent name for an entity E , is independent from theaddresses of the access points offered by E .

4 / 38

Identifiers

Pure nameA name that has no meaning at all; it is just a random string. Purenames can be used for comparison only.

IdentifierA name having the following properties:

P1: Each identifier refers to at most one entityP2: Each entity is referred to by at most one identifierP3: An identifier always refers to the same entity (prohibits reusingan identifier)

ObservationAn identifier need not necessarily be a pure name, i.e., it may havecontent.

5 / 38

Naming 5.2 Flat Naming

Flat naming

ProblemGiven an essentially unstructured name (e.g., an identifier), how canwe locate its associated access point?

Simple solutions (broadcasting)Home-based approachesDistributed Hash Tables (structured P2P)Hierarchical location service

6 / 38

Simple solutions

Broadcasting

Broadcast the ID, requesting the entity to return its current address.

Can never scale beyond local-area networksRequires all processes to listen to incoming location requests

Forwarding pointers

When an entity moves, it leaves behind a pointer to its next location

Dereferencing can be made entirely transparent to clients by simplyfollowing the chain of pointersUpdate a client’s reference when present location is foundGeographical scalability problems (for which separate chain reductionmechanisms are needed):

Long chains are not fault tolerantIncreased network latency at dereferencing

7 / 38

Simple solutions

Broadcasting

Broadcast the ID, requesting the entity to return its current address.

Can never scale beyond local-area networksRequires all processes to listen to incoming location requests

Forwarding pointers

When an entity moves, it leaves behind a pointer to its next location

Dereferencing can be made entirely transparent to clients by simplyfollowing the chain of pointersUpdate a client’s reference when present location is foundGeographical scalability problems (for which separate chain reductionmechanisms are needed):

Long chains are not fault tolerantIncreased network latency at dereferencing

7 / 38

Home-based approaches

Single-tiered schemeLet a home keep track of where the entity is:

Entity’s home address registered at a naming serviceThe home registers the foreign address of the entityClient contacts the home first, and then continues with foreignlocation

8 / 38

Host's present location

Client'slocation

1. Send packet to host at its home

2. Return addressof current location

3. Tunnel packet tocurrent location

4. Send successive packetsto current location

Host's homelocation

9 / 38

Two-tiered schemeKeep track of visiting entities:

Check local visitor register firstFall back to home location if local lookup fails

Problems with home-based approachesHome address has to be supported for entity’s lifetimeHome address is fixed⇒ unnecessary burden when the entitypermanently movesPoor geographical scalability (entity may be next to client)

QuestionHow can we solve the “permanent move” problem?

10 / 38

Distributed Hash Tables (DHT)

ChordConsider the organization of many nodes into a logical ring

Each node is assigned a random m-bit identifier.Every entity is assigned a unique m-bit key.Entity with key k falls under jurisdiction of node with smallestid ≥ k (called its successor).

NonsolutionLet node id keep track of succ(id) and start linear search along thering.

11 / 38

DHTs: Finger tables

PrincipleEach node p maintains a finger table FTp[] with at most m entries:

FTp[i] = succ(p +2i−1)

Note: FTp[i] points to the first node succeeding p by at least 2i−1.To look up a key k , node p forwards the request to node with indexj satisfying

q = FTp[j]≤ k < FTp[j +1]

If p < k < FTp[1], the request is also forwarded to FTp[1]

12 / 38

DHTs: Finger tables

12 / 38

DHTs: Finger tables

12 / 38

DHTs: Finger tables

15161718

1 42 43 94 95 18

1 92 93 94 145 20

1 112 113 144 185 28

1 142 143 184 205 28

1 182 183 184 285 1

1 202 203 284 285 4

1 212 283 284 285 4

1 282 283 284 15 9

1 12 13 14 45 14

Resolve k = 26 from node 1

Resolve k = 12 from node 28

i succ

Finger table

Actual node

13 / 38

Exploiting network proximity

ProblemThe logical organization of nodes in the overlay may lead to erratic messagetransfers in the underlying Internet: node k and node succ(k +1) may bevery far apart.

Topology-aware node assignment: When assigning an ID to a node, makesure that nodes close in the ID space are also close in the network. Canbe very difficult.

Proximity routing: Maintain more than one possible successor, and forward tothe closest.Example: in Chord FTp[i] points to first node inINT = [p +2i−1,p +2i −1]. Node p can also store pointers to othernodes in INT .

Proximity neighbor selection: When there is a choice of selecting who yourneighbor will be (not in Chord), pick the closest one.

14 / 38

Hierarchical Location Services (HLS)

Basic ideaBuild a large-scale search tree for which the underlying network isdivided into hierarchical domains. Each domain is represented by aseparate directory node.

A leaf domain, contained in S

Directory nodedir(S) of domain S

A subdomain Sof top-level domain T(S is contained in T)

Top-leveldomain T

The root directorynode dir(T)

15 / 38

HLS: Tree organization

Invariants

Address of entity E is stored in a leaf or intermediate nodeIntermediate nodes contain a pointer to a child iff the subtree rooted atthe child stores an address of the entityThe root knows about all entities

Domain D2Domain D1

Field with no data

Location recordwith only one field,containing an address

Field for domaindom(N) withpointer to N

Location recordfor E at node M

16 / 38

HLS: Lookup operation

Basic principles

Start lookup at local leaf nodeNode knows about E ⇒ follow downward pointer, else go upUpward lookup always stops at root

Domain D

Node has norecord for E, sothat request isforwarded toparent

Look-uprequest

Node knowsabout E, so requestis forwarded to child

17 / 38

HLS: Insert operation

Domain D

Node has norecord for E,so request isforwardedto parent

Insertrequest

Node knowsabout E, so requestis no longer forwarded

Node creates recordand stores pointer

Node createsrecord andstores address

18 / 38

Naming 5.3 Structured Naming

Name space

EssenceA graph in which a leaf node represents a (named) entity. A directory node isan entity that refers to other nodes.

.twmrc mbox

home keys

"/home/steen/mbox"

"/keys""/home/steen/keys"

Data stored in n1

Directory node

Leaf node

n2: "elke"n3: "max"n4: "steen"

NoteA directory node contains a (directory) table of (edge label, node identifier)pairs.

19 / 38

Name space

ObservationWe can easily store all kinds of attributes in a node, describing aspectsof the entity the node represents:

Type of the entityAn identifier for that entityAddress of the entity’s locationNicknames...

NoteDirectory nodes can also have attributes, besides just storing adirectory table with (edge label, node identifier) pairs.

20 / 38

Name space

ObservationWe can easily store all kinds of attributes in a node, describing aspectsof the entity the node represents:

Type of the entityAn identifier for that entityAddress of the entity’s locationNicknames...

NoteDirectory nodes can also have attributes, besides just storing adirectory table with (edge label, node identifier) pairs.

20 / 38

Name resolution

ProblemTo resolve a name we need a directory node. How do we actually find that(initial) node?

Closure mechanismThe mechanism to select the implicit context from which to start nameresolution:

www.cs.vu.nl: start at a DNS name server/home/steen/mbox: start at the local NFS file server (possible recursivesearch)0031204447784: dial a phone number130.37.24.8: route to the VU’s Web server

QuestionWhy are closure mechanisms always implicit?

21 / 38

Name resolution

21 / 38

Name resolution

21 / 38

Name linking

Hard linkWhat we have described so far as a path name: a name that isresolved by following a specific path in a naming graph from one nodeto another.

22 / 38

Name linking

Soft linkAllow a node O to contain a name of another node:

First resolve O’s name (leading to O)Read the content of O, yielding nameName resolution continues with name

ObservationsThe name resolution process determines that we read the contentof a node, in particular, the name in the other node that we needto go to.One way or the other, we know where and how to start nameresolution given name

23 / 38

Name linking

Soft linkAllow a node O to contain a name of another node:

First resolve O’s name (leading to O)Read the content of O, yielding nameName resolution continues with name

ObservationsThe name resolution process determines that we read the contentof a node, in particular, the name in the other node that we needto go to.One way or the other, we know where and how to start nameresolution given name

23 / 38

Name linking

.twmrc

"/home/steen/keys"

"/keys"n1

mbox "/keys"

Data stored in n6n4

elke steen

home keysData stored in n1

Directory node

Leaf node

n2: "elke"n3: "max"n4: "steen"

ObservationNode n5 has only one name

24 / 38

Name-space implementation

Basic issueDistribute the name resolution process as well as name space managementacross multiple machines, by distributing nodes of the naming graph.

Distinguish three levels

Global level: Consists of the high-level directory nodes. Main aspect isthat these directory nodes have to be jointly managed by differentadministrationsAdministrational level: Contains mid-level directory nodes that can begrouped in such a way that each group can be assigned to a separateadministration.Managerial level: Consists of low-level directory nodes within a singleadministration. Main issue is effectively mapping directory nodes to localname servers.

25 / 38

org netjp us

ai linda

jack jill

oce vu

ftp www

com edugov mil

index.txt

Mana-geriallayer

Adminis-trational

Globallayer

26 / 38

Item Global Administrational Managerial1 Worldwide Organization Department2 Few Many Vast numbers3 Seconds Milliseconds Immediate4 Lazy Immediate Immediate5 Many None or few None6 Yes Yes Sometimes

1: Geographical scale 4: Update propagation2: # Nodes 5: # Replicas3: Responsiveness 6: Client-side caching?

27 / 38

Iterative name resolution

1 resolve(dir,[name1,...,nameK]) sent to Server0 responsible for dir2 Server0 resolves resolve(dir,name1)→ dir1, returning the identification

(address) of Server1, which stores dir1.3 Client sends resolve(dir1,[name2,...,nameK]) to Server1, etc.

Client'snameresolver

Rootname server

Name servernl node

Name servervu node

Name servercs node

1. <nl,vu,cs,ftp>

2. #<nl>, <vu,cs,ftp>

3. <vu,cs,ftp>

4. #<vu>, <cs,ftp>

5. <cs,ftp>

6. #<cs>, <ftp>

Nodes aremanaged bythe same server

7. <ftp>

8. #<ftp>

#<nl,vu,cs,ftp><nl,vu,cs,ftp>

28 / 38

Recursive name resolution

1 resolve(dir,[name1,...,nameK]) sent to Server0 responsible for dir2 Server0 resolves resolve(dir,name1)→ dir1, and sends

resolve(dir1,[name2,...,nameK]) to Server1, which stores dir1.3 Server0 waits for result from Server1, and returns it to client.

Client'snameresolver

Rootname server

Name servernl node

Name servervu node

Name servercs node

1. <nl,vu,cs,ftp>

2. <vu,cs,ftp>

7. #<vu,cs,ftp>3. <cs,ftp>

6. #<cs,ftp>4. <ftp>

5. #<ftp>

#<nl,vu,cs,ftp>

8. #<nl,vu,cs,ftp>

<nl,vu,cs,ftp>

29 / 38

Caching in recursive name resolution

Server Should Looks up Passes to Receives Returnsfor node resolve child and caches to requester

cs <ftp> #<ftp> — — #<ftp>

vu <cs,ftp> #<cs> <ftp> #<ftp> #<cs>

#<cs, ftp>

nl <vu,cs,ftp> #<vu> <cs,ftp> #<cs> #<vu>

#<cs,ftp> #<vu,cs>

#<vu,cs,ftp>

root <nl,vu,cs,ftp> #<nl> <vu,cs,ftp> #<vu> #<nl>#<vu,cs> #<nl,vu>

#<vu,cs,ftp> #<nl,vu,cs>

#<nl,vu,cs,ftp>

30 / 38

Scalability issues

Size scalability

We need to ensure that servers can handle a large number of requests pertime unit⇒ high-level servers are in big trouble.

SolutionAssume (at least at global and administrational level) that content of nodeshardly ever changes. We can then apply extensive replication by mappingnodes to multiple servers, and start name resolution at the nearest server.

ObservationAn important attribute of many nodes is the address where the representedentity can be contacted. Replicating nodes makes large-scale traditionalname servers unsuitable for locating mobile entities.

31 / 38

Scalability issues

Size scalability

31 / 38

Scalability issues

Size scalability

31 / 38

Scalability issues

Geographical scalability

We need to ensure that the name resolution process scales across largegeographical distances.

Name servernl node

Name servervu node

Name servercs node

Client

Long-distance communication

Recursive name resolution

Iterative name resolution

ProblemBy mapping nodes to servers that can be located anywhere, we introduce animplicit location dependency.

32 / 38

Example: Decentralized DNS

Basic ideaTake a full DNS name, hash into a key k , and use a DHT-based system toallow for key lookups. Main drawback: You can’t ask for all nodes in asubdomain (but very few people were doing this anyway).

Information in a node

SOA Zone Holds info on the represented zoneA Host IP addr. of host this node representsMX Domain Mail server to handle mail for this nodeSRV Domain Server handling a specific serviceNS Zone Name server for the represented zoneCNAME Node Symbolic linkPTR Host Canonical name of a hostHINFO Host Info on this hostTXT Any kind Any info considered useful

33 / 38

DNS on Pastry

Pastry

DHT-based system that works with prefixes of keys. Consider a system inwhich keys come from a 4-digit number space. A node with ID 3210 keepstrack of the following nodes

nk prefix of ID(nk ) nk prefix of ID(nk )n0 0 n1 1n2 2 n30 30n31 31 n33 33n320 320 n322 322n323 323

NoteNode 3210 is responsible for handling keys with prefix 321. If it receives arequest for key 3012, it will forward the request to node n30. For DNS: A noderesponsible for key k stores DNS records of names with hash value k .

34 / 38

Replication of records

DefinitionReplicated at level i – record is replicated to all nodes with i matchingprefixes. Note: # hops for looking up record at level i is generally i .

ObservationLet xi denote the fraction of most popular DNS names of which therecords should be replicated at level i , then:

[d i(logN−C)

1+d + · · ·+d logN−1

]1/(1−α)

with N is the total number of nodes, d = b(1−α)/α and α ≈ 1, assumingthat popularity follows a Zipf distribution:The frequency of the n-th ranked item is proportional to 1/nα

35 / 38

Replication of records

MeaningIf you want to reach an average of C = 1 hops when looking up a DNSrecord, then with b = 4, α = 0.9, N = 10,000 and 1,000,000 recordsthat

61 most popular records should bereplicated at level 0

284 next most popular records at level 11323 next most popular records at level 26177 next most popular records at level 3

28826 next most popular records at level 4134505 next most popular records at level 5the rest should not be replicated

36 / 38

Naming 5.4 Attribute-Based Naming

Attribute-based naming

ObservationIn many cases, it is much more convenient to name, and look upentities by means of their attributes⇒ traditional directory services(aka yellow pages).

ProblemLookup operations can be extremely expensive, as they require tomatch requested attribute values, against actual attribute values⇒inspect all entities (in principle).

SolutionImplement basic directory service as database, and combine withtraditional structured naming system.

37 / 38

Example: LDAP

C = NL

O = Vrije Universiteit

OU = Comp. Sc.

Host_Name = star Host_Name = zephyr

CN = Main server

Attribute Value Attribute Value

Country NL Country NL

Locality Amsterdam Locality Amsterdam

Organization Vrije Universiteit Organization Vrije Universiteit

OrganizationalUnit Comp. Sc. OrganizationalUnit Comp. Sc.

CommonName Main server CommonName Main server

Host Name star Host Name zephyr

Host Address 192.31.231.42 Host Address 137.37.20.10

answer =search("&(C = NL) (O = Vrije Universiteit) (OU = *) (CN = Main server)")

38 / 38

distributed systems principles and paradigmsiwanicki/courses/ds/2014/...2014/06/07 · distributed...

Documents