tapestry: scalable and fault-tolerant routing and location

26
Tapestry: Scalable and Fault-tolerant Routing and Location Stanford Networking Seminar October 2001 Ben Y. Zhao [email protected]

Upload: edison

Post on 10-Jan-2016

49 views

Category:

Documents


0 download

DESCRIPTION

Tapestry: Scalable and Fault-tolerant Routing and Location. Stanford Networking Seminar October 2001 Ben Y. Zhao [email protected]. Challenges in the Wide-area. Trends: Exponential growth in CPU, storage Network expanding in reach and b/w Can applications leverage new resources? - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Tapestry: Scalable and Fault-tolerant  Routing and Location

Tapestry: Scalable and Fault-tolerant Routing and Location

Stanford Networking SeminarOctober 2001

Ben Y. [email protected]

Page 2: Tapestry: Scalable and Fault-tolerant  Routing and Location

Stanford Networking Seminar, 10/2001 2

Challenges in the Wide-area

Trends: Exponential growth in CPU, storage Network expanding in reach and b/w

Can applications leverage new resources? Scalability: increasing users, requests, traffic Resilience: more components inversely low

MTBF Management: intermittent resource availability

complex management schemes

Proposal: an infrastructure that solves these issues and passes benefits onto applications

Page 3: Tapestry: Scalable and Fault-tolerant  Routing and Location

Stanford Networking Seminar, 10/2001 3

Driving Applications

Leverage of cheap & plentiful resources: CPU cycles, storage, network bandwidth

Global applications share distributed resources Shared computation:

SETI, Entropia

Shared storage OceanStore, Gnutella, Scale-8

Shared bandwidth Application-level multicast, content distribution networks

Page 4: Tapestry: Scalable and Fault-tolerant  Routing and Location

Stanford Networking Seminar, 10/2001 4

Key: Location and Routing

Hard problem: Locating and messaging to resources and data

Goals for a wide-area overlay infrastructure Easy to deploy Scalable to millions of nodes, billions of objects Available in presence of routine faults Self-configuring, adaptive to network changes Localize effects of operations/failures

Page 5: Tapestry: Scalable and Fault-tolerant  Routing and Location

Stanford Networking Seminar, 10/2001 5

Talk Outline

Motivation

Tapestry overview

Fault-tolerant operation

Deployment / evaluation

Related / ongoing work

Page 6: Tapestry: Scalable and Fault-tolerant  Routing and Location

Stanford Networking Seminar, 10/2001 6

What is Tapestry?

A prototype of a decentralized, scalable, fault-tolerant, adaptive location and routing infrastructure(Zhao, Kubiatowicz, Joseph et al. U.C. Berkeley)

Network layer of OceanStore

Routing: Suffix-based hypercube Similar to Plaxton, Rajamaran, Richa (SPAA97)

Decentralized location: Virtual hierarchy per object with cached location references

Core API: publishObject(ObjectID, [serverID]) routeMsgToObject(ObjectID) routeMsgToNode(NodeID)

Page 7: Tapestry: Scalable and Fault-tolerant  Routing and Location

Stanford Networking Seminar, 10/2001 7

Routing and Location

Namespace (nodes and objects) 160 bits 280 names before name collision Each object has its own hierarchy rooted at Root

f (ObjectID) = RootID, via a dynamic mapping function

Suffix routing from A to B At hth hop, arrive at nearest node hop(h) s.t.

hop(h) shares suffix with B of length h digits Example: 5324 routes to 0629 via

5324 2349 1429 7629 0629

Object location: Root responsible for storing object’s location Publish / search both route incrementally to root

Page 8: Tapestry: Scalable and Fault-tolerant  Routing and Location

Stanford Networking Seminar, 10/2001 8

Publish / Lookup

Publish object with ObjectID:// route towards “virtual root,” ID=ObjectID

For (i=0, i<Log2(N), i+=j) { //Define hierarchy j is # of bits in digit size, (i.e. for hex digits, j = 4 ) Insert entry into nearest node that matches on

last i bits If no matches found, deterministically choose alternative Found real root node, when no external routes left

Lookup objectTraverse same path to root as publish, except search for entry at each node

For (i=0, i<Log2(N), i+=j) { Search for cached object location Once found, route via IP or Tapestry to object

Page 9: Tapestry: Scalable and Fault-tolerant  Routing and Location

Stanford Networking Seminar, 10/2001 9

4

2

3

3

3

2

2

1

2

4

1

2

3

3

1

34

1

1

4 3

2

4

Tapestry MeshIncremental suffix-based routing

NodeID0x43FE

NodeID0x13FENodeID

0xABFE

NodeID0x1290

NodeID0x239E

NodeID0x73FE

NodeID0x423E

NodeID0x79FE

NodeID0x23FE

NodeID0x73FF

NodeID0x555E

NodeID0x035E

NodeID0x44FE

NodeID0x9990

NodeID0xF990

NodeID0x993E

NodeID0x04FE

NodeID0x43FE

Page 10: Tapestry: Scalable and Fault-tolerant  Routing and Location

Stanford Networking Seminar, 10/2001 10

Routing in Detail

5712

0880

3210

7510

4510

Neighbor MapFor “5712” (Octal)

Routing Levels1234

xxx1

5712

xxx0

xxx3

xxx4

xxx5

xxx6

xxx7

xx02

5712

xx22

xx32

xx42

xx52

xx62

xx72

x012

x112

x212

x312

x412

x512

x612

5712

0712

1712

2712

3712

4712

5712

6712

7712

5712 0 1 2 3 4 5 6 7

0880 0 1 2 3 4 5 6 7

3210 0 1 2 3 4 5 6 7

4510 0 1 2 3 4 5 6 7

7510 0 1 2 3 4 5 6 7

Example: Octal digits, 212 namespace, 5712 7510

Page 11: Tapestry: Scalable and Fault-tolerant  Routing and Location

Stanford Networking Seminar, 10/2001 11

Object LocationRandomization and Locality

Page 12: Tapestry: Scalable and Fault-tolerant  Routing and Location

Stanford Networking Seminar, 10/2001 12

Talk Outline

Motivation

Tapestry overview

Fault-tolerant operation

Deployment / evaluation

Related / ongoing work

Page 13: Tapestry: Scalable and Fault-tolerant  Routing and Location

Stanford Networking Seminar, 10/2001 13

Fault-tolerant Location

Minimized soft-state vs. explicit fault-recoveryRedundant roots Object names hashed w/ small salts multiple

names/roots Queries and publishing utilize all roots in parallel P(finding reference w/ partition) = 1 – (1/2)n

where n = # of roots

Soft-state periodic republish 50 million files/node, daily republish,

b = 16, N = 2160 , 40B/msg, worst case update traffic: 156 kb/s,

expected traffic w/ 240 real nodes: 39 kb/s

Page 14: Tapestry: Scalable and Fault-tolerant  Routing and Location

Stanford Networking Seminar, 10/2001 14

Fault-tolerant Routing

Strategy: Detect failures via soft-state probe packets Route around problematic hop via backup pointers

Handling: 3 forward pointers per outgoing route

(2 backups) 2nd chance algorithm for intermittent failures Upgrade backup pointers and replace

Protocols: First Reachable Link Selection (FRLS) Proactive Duplicate Packet Routing

Page 15: Tapestry: Scalable and Fault-tolerant  Routing and Location

Stanford Networking Seminar, 10/2001 15

Summary

Decentralized location and routing infrastructure Core routing similar to PRR97 Distributed algorithms for object-root

mapping, node insertion / deletion Fault-handling with redundancy,

soft-state beacons, self-repair Decentralized and scalable, with locality

Analytical properties Per node routing table size: bLogb(N)

N = size of namespace, n = # of physical nodes

Find object in Logb(n) overlay hops

Page 16: Tapestry: Scalable and Fault-tolerant  Routing and Location

Stanford Networking Seminar, 10/2001 16

Talk Outline

Motivation

Tapestry overview

Fault-tolerant operation

Deployment / evaluation

Related / ongoing work

Page 17: Tapestry: Scalable and Fault-tolerant  Routing and Location

Stanford Networking Seminar, 10/2001 17

Deployment Status

Java Implementation in OceanStoreRunning static TapestryDeploying dynamic Tapestry with fault-

tolerant routing

Packet-level simulatorDelay measured in network hopsNo cross traffic or queuing delaysTopologies: AS, MBone, GT-ITM, TIERS

ns2 simulations

Page 18: Tapestry: Scalable and Fault-tolerant  Routing and Location

Stanford Networking Seminar, 10/2001 18

Evaluation Results

Cached object pointersEfficient lookup for nearby objectsReasonable storage overhead

Multiple object roots Improves availability under attack Improves performance and perf. stability

Reliable packet deliveryRedundant pointers approximate optimal

reachabilityFRLS, a simple fault-tolerant UDP protocol

Page 19: Tapestry: Scalable and Fault-tolerant  Routing and Location

Stanford Networking Seminar, 10/2001 19

First Reachable Link Selection

Use periodic UDP packets to gauge link condition

Packets routed to shortest “good” link

Assumes IP cannot correct routing table in time for packet delivery

ABCDE

IP Tapestry

No path exists to dest.

Page 20: Tapestry: Scalable and Fault-tolerant  Routing and Location

Stanford Networking Seminar, 10/2001 20

Talk Outline

Motivation

Tapestry overview

Fault-tolerant operation

Deployment / evaluation

Related / ongoing work

Page 21: Tapestry: Scalable and Fault-tolerant  Routing and Location

Stanford Networking Seminar, 10/2001 21

Bayeux

Global-scale application-level multicast(NOSSDAV 2001)Scalability Scales to > 105 nodes Self-forming member group partitions

Fault tolerance Multicast root replication FRLS for resilient packet delivery

More optimizations Group ID clustering for better b/w utilization

Page 22: Tapestry: Scalable and Fault-tolerant  Routing and Location

Stanford Networking Seminar, 10/2001 22

13FEABFE

1290239E

73FE

423E

793E

44FE

9990

F990

993E

04FE

093E

29FE

F9FE

79FE

555E

035E

23FE

43FE

MulticastRoot

Receiver

Receiver

793E

79FE

79FE

793E

793E79FE

793E79FE

Bayeux: Multicast

Page 23: Tapestry: Scalable and Fault-tolerant  Routing and Location

Stanford Networking Seminar, 10/2001 23

13FEABFE

1290239E

73FE

423E

793E

44FE

9990

F990

993E

04FE

093E

29FE

F9FE

79FE

555E

035E

23FE

43FE43FE

MulticastRoot

MulticastRoot

Receiver

Receiver

Receiver

JOIN

JOIN

Bayeux: Tree Partitioning

JOIN

Page 24: Tapestry: Scalable and Fault-tolerant  Routing and Location

Stanford Networking Seminar, 10/2001 24

Overlay Routing NetworksCAN: Ratnasamy et al., (ACIRI / UCB) Uses d-dimensional coordinate space to

implement distributed hash table Route to neighbor closest to destination

coordinate

Chord: Stoica, Morris, Karger, et al., (MIT / UCB) Linear namespace modeled as circular

address space “Finger-table” point to logarithmic # of inc.

remote hosts

Pastry: Rowstron and Druschel (Microsoft / Rice ) Hypercube routing similar to PRR97 Objects replicated to servers by name

Fast Insertion / DeletionConstant-sized routing stateUnconstrained # of hopsOverlay distance not prop. to physical distance

Simplicity in algorithmsFast fault-recovery

Log2(N) hops and routing stateOverlay distance not prop. to physical distance

Fast fault-recoveryLog(N) hops and routing stateData replication required for fault-tolerance

Page 25: Tapestry: Scalable and Fault-tolerant  Routing and Location

Stanford Networking Seminar, 10/2001 25

Ongoing Research

Fault-tolerant routingReliable Overlay Networks (MIT)Fault-tolerant Overlay Routing (UCB)

Application-level multicastBayeux (UCB), CAN (AT&T),

Scribe and Herald (Microsoft)

File systemsOceanStore (UCB)PAST (Microsoft / Rice)Cooperative File System (MIT)

Page 26: Tapestry: Scalable and Fault-tolerant  Routing and Location

Stanford Networking Seminar, 10/2001 26

For More Information

Tapestry:http://www.cs.berkeley.edu/~ravenben/tapestry

OceanStore:http://oceanstore.cs.berkeley.edu

Related papers:http://oceanstore.cs.berkeley.edu/publications

http://www.cs.berkeley.edu/~ravenben/publications

[email protected]