dht* applications jeffrey pang cmu nettalk, dec. 5, 2003 * and dolr

DHT* Applications Jeffrey Pang CMU NetTalk, Dec. 5, 2003 * and DOLR

Post on 21-Dec-2015




1 download


Page 1: DHT* Applications Jeffrey Pang CMU NetTalk, Dec. 5, 2003 * and DOLR

DHT* Applications

Jeffrey PangCMU NetTalk, Dec. 5, 2003

* and DOLR

Page 2: DHT* Applications Jeffrey Pang CMU NetTalk, Dec. 5, 2003 * and DOLR

Jeffrey Pang, Carnegie Mellon, NetTalk 2Dec. 5, 2003

Brief Review of DHTs

Many DHTs: PRR Trees, Pastry, Tapestry Chord, Symphony CAN SkipNet, Kademlia, Koorde, Viceroy, etc., etc.

Good Properties: Distributed construction/maintenance Load-balanced with uniform identifiers O(log n) hops / neighbors per node Provides underlying network proximity

Page 3: DHT* Applications Jeffrey Pang CMU NetTalk, Dec. 5, 2003 * and DOLR

Jeffrey Pang, Carnegie Mellon, NetTalk 3Dec. 5, 2003

Brief Review of DHTs

2 4 7 B

9 F 1 0

9 A 7 6

9 A E 2










Page 4: DHT* Applications Jeffrey Pang CMU NetTalk, Dec. 5, 2003 * and DOLR

Jeffrey Pang, Carnegie Mellon, NetTalk 4Dec. 5, 2003

Overview of Talk

Review of DHTs DHT vs DOLR Storage Multicast Database Misc. API and Infrastructure Proposals

Page 5: DHT* Applications Jeffrey Pang CMU NetTalk, Dec. 5, 2003 * and DOLR

Jeffrey Pang, Carnegie Mellon, NetTalk 5Dec. 5, 2003

Overview of Talk

Review of DHTs DHT vs DOLR Storage Multicast Database Misc. API and Infrastructure Proposals

Page 6: DHT* Applications Jeffrey Pang CMU NetTalk, Dec. 5, 2003 * and DOLR

Jeffrey Pang, Carnegie Mellon, NetTalk 6Dec. 5, 2003


Distributed Hash Table Paradigm: Location of objects determined by overlay put(key, object) get(key, object)

Distributed Object Location and Routing Paradigm: Location of objects determined by application Application publishes pointers in overlay publish(key, id) locate(key)

Page 7: DHT* Applications Jeffrey Pang CMU NetTalk, Dec. 5, 2003 * and DOLR

Jeffrey Pang, Carnegie Mellon, NetTalk 7Dec. 5, 2003

DHT Paradigm

obj key obj

put(key, object) get(key, object)

Page 8: DHT* Applications Jeffrey Pang CMU NetTalk, Dec. 5, 2003 * and DOLR

Jeffrey Pang, Carnegie Mellon, NetTalk 8Dec. 5, 2003

DOLR Paradigm


publish(key, id) locate(key)

- back pointer


Of course, many apps usea little bit of both paradigms...

Page 9: DHT* Applications Jeffrey Pang CMU NetTalk, Dec. 5, 2003 * and DOLR

Jeffrey Pang, Carnegie Mellon, NetTalk 9Dec. 5, 2003

Overview of Talk

Review of DHTs DHT vs DOLR Storage Multicast Database Misc. API and Infrastructure Proposals

Page 10: DHT* Applications Jeffrey Pang CMU NetTalk, Dec. 5, 2003 * and DOLR

Jeffrey Pang, Carnegie Mellon, NetTalk 10Dec. 5, 2003

Storage Systems

Mnemosnye [Hand & Roscoe, IPTPS02] stenographic storage

PAST [Rowstron & Druschel, SOSP01] file-based storage substrate

CFS [Dabek, et al., SOSP01] single writer cooperative storage

Ivy [Muthitacharoen, et al., OSDI02] small group read/write storage

OceanStore [Kubiatowicz, et al., ASPLOS00, FAST03] global-scale persistent storage

Page 11: DHT* Applications Jeffrey Pang CMU NetTalk, Dec. 5, 2003 * and DOLR

Jeffrey Pang, Carnegie Mellon, NetTalk 11Dec. 5, 2003

Mnemosnye Target:

Data that requires privacy and plausible deniability Uses:

Tapestry as DHT Basic idea:

Compute n hashes for a block: h0, h1 = H(h0), ..., hn-1 = H(hn-2)

Store the (encrypted) block at the addresses h0, ..., hn-1 (mod X = size of store).

Given h0 and key, try to lookup and decrypt each replica in turn (success if passes validity check)

In a p2p overlay, use part of the hash value as a node address, the other part as the block addr on that node

Importance: Simple. Only uses the basic get/put operators. ... but requires end nodes to obey block addresses

Page 12: DHT* Applications Jeffrey Pang CMU NetTalk, Dec. 5, 2003 * and DOLR

Jeffrey Pang, Carnegie Mellon, NetTalk 12Dec. 5, 2003

PAST Target:

Wide area heterogeneous storage (e.g., web) Uses:

Pastry as DHT Basic Idea:

Store a file at h = H(file); Lookup with h Replicate file at leaf-set of root (l nearest

nodes in id-space) Cache file along lookup paths Deal with heterogeneity using virtual nodes

and replica diversion Importance:

Graceful degradation under high utilization

Page 13: DHT* Applications Jeffrey Pang CMU NetTalk, Dec. 5, 2003 * and DOLR

Jeffrey Pang, Carnegie Mellon, NetTalk 13Dec. 5, 2003

PAST Space Management

Page 14: DHT* Applications Jeffrey Pang CMU NetTalk, Dec. 5, 2003 * and DOLR

Jeffrey Pang, Carnegie Mellon, NetTalk 14Dec. 5, 2003

PAST Caching

Page 15: DHT* Applications Jeffrey Pang CMU NetTalk, Dec. 5, 2003 * and DOLR

Jeffrey Pang, Carnegie Mellon, NetTalk 15Dec. 5, 2003

CFS Target:

Single writer, multiple readers (e.g., FTP) Uses:

Chord as DHT Basic Idea:

FS implemented on top of DHash layer DHash replication, caching, load balancing same as

PAST Secure updates and deletion using signed root block and

cryptographic hashes to identify directory and file blocks Pre-fetch blocks of the same file/directory

Importance: “Real-life” evaluation comparable to FTP

Page 16: DHT* Applications Jeffrey Pang CMU NetTalk, Dec. 5, 2003 * and DOLR

Jeffrey Pang, Carnegie Mellon, NetTalk 16Dec. 5, 2003

CFS File System Structure


public key

Root Block






File Block

B1 B2


Page 17: DHT* Applications Jeffrey Pang CMU NetTalk, Dec. 5, 2003 * and DOLR

Jeffrey Pang, Carnegie Mellon, NetTalk 17Dec. 5, 2003

CFS “Real-Life” Evaluation

CFS Pair-wise TCP

Page 18: DHT* Applications Jeffrey Pang CMU NetTalk, Dec. 5, 2003 * and DOLR

Jeffrey Pang, Carnegie Mellon, NetTalk 18Dec. 5, 2003

Ivy Target:

Read/write storage for small groups (e.g., CVS) Uses:

Chord as DHT Basic Idea:

Implemented on top of DHash layer (identical to CFS) Each FS has a view consisting of n logs, one per writer Write operations go to personal log Reads reconstruct data by reading all logs in view;

occasionally snapshot FS to prevent long traversals Consistency using version vectors (application resolvers

for concurrent versions; e.g., created during partition) Importance:

Another “real-life” evaluation, but disappointing Practical model for read/write in a p2p environment

Page 19: DHT* Applications Jeffrey Pang CMU NetTalk, Dec. 5, 2003 * and DOLR

Jeffrey Pang, Carnegie Mellon, NetTalk 19Dec. 5, 2003

Ivy Log Structure

Log head

Log head









Page 20: DHT* Applications Jeffrey Pang CMU NetTalk, Dec. 5, 2003 * and DOLR

Jeffrey Pang, Carnegie Mellon, NetTalk 20Dec. 5, 2003

Ivy Wide Area PerformanceModifiedAndrewBenchmark


Page 21: DHT* Applications Jeffrey Pang CMU NetTalk, Dec. 5, 2003 * and DOLR

Jeffrey Pang, Carnegie Mellon, NetTalk 21Dec. 5, 2003

OceanStore Target:

Global storage as a “utility” Uses:

Tapestry as DOLR Basic Idea:

Use Tapestry for (all) object and service location. Writes go to an Inner-Ring, serialized using Byzantine

Agreement Writes create new versions of blocks, which are

permanently dispersed into archive using erasure codes Reads go to closest replica in a dissemination tree

rooted at Inner-Ring Importance:

Wide area Byzantine commit Performance of strong crypto in critical path Caching in a DOLR (only participating nodes involved)

Page 22: DHT* Applications Jeffrey Pang CMU NetTalk, Dec. 5, 2003 * and DOLR

Jeffrey Pang, Carnegie Mellon, NetTalk 22Dec. 5, 2003

OceanStore Update Path

Page 23: DHT* Applications Jeffrey Pang CMU NetTalk, Dec. 5, 2003 * and DOLR

Jeffrey Pang, Carnegie Mellon, NetTalk 23Dec. 5, 2003

OceanStore Object Model

Page 24: DHT* Applications Jeffrey Pang CMU NetTalk, Dec. 5, 2003 * and DOLR

Jeffrey Pang, Carnegie Mellon, NetTalk 24Dec. 5, 2003

OceanStore Inner Ring Perf.

Page 25: DHT* Applications Jeffrey Pang CMU NetTalk, Dec. 5, 2003 * and DOLR

Jeffrey Pang, Carnegie Mellon, NetTalk 25Dec. 5, 2003

OceanStore Read Perf.

Archive Read

Streaming Readsfrom Replicas

Page 26: DHT* Applications Jeffrey Pang CMU NetTalk, Dec. 5, 2003 * and DOLR

Jeffrey Pang, Carnegie Mellon, NetTalk 26Dec. 5, 2003

Overview of Talk

Review of DHTs DHT vs DOLR Storage Multicast Database Misc. API and Infrastructure Proposals

Page 27: DHT* Applications Jeffrey Pang CMU NetTalk, Dec. 5, 2003 * and DOLR

Jeffrey Pang, Carnegie Mellon, NetTalk 27Dec. 5, 2003

Multicast Applications

Bayeux [Zhuang, et al., NOSSDAV01] Simple single tree per source on DOLR

Scribe [Rowstron, et al., NGC01, INFOCOMM03] Simple single tree per source on DHT

SplitStream [Castro, et al., SOSP03] Multiple disjoint trees per source

i3 [Stoica, et al. SIGCOMM02] Internet Indirection Infrastructure (mobility,

{multi,any}cast, service composition)

Page 28: DHT* Applications Jeffrey Pang CMU NetTalk, Dec. 5, 2003 * and DOLR

Jeffrey Pang, Carnegie Mellon, NetTalk 28Dec. 5, 2003

Bayeux Target:

Multimedia Streaming Uses:

Tapestry as DOLR Basic Idea:

Advertise session with fake file in Tapestry Clients join by routing message to source id (after

learning of it by lookup up the session) All intermediate routers on path join tree Support multiple roots by having multiple sources

advertise a session (lookups converge to “closest”) Take advantage of routing redundancy to provide best

performance (shortest link) / tolerate faults (predict link reliability)

Importance: Relatively simple (no “frills”) multicast on a DOLR

Page 29: DHT* Applications Jeffrey Pang CMU NetTalk, Dec. 5, 2003 * and DOLR

Jeffrey Pang, Carnegie Mellon, NetTalk 29Dec. 5, 2003

Scribe Target:

Event notification / pubsub systems (e.g., IM) Uses:

Pastry or CAN as DHTs Basic Idea:

Publications routed to root in Pastry Recursively forwarded to all children in tree Subscriptions cause all nodes on path to root to join tree When your parent dies repair by routing to a new parent More complex ways to load balance (e.g., make children

into grandchildren) described in later JSAC article Importance:

Another simple multicast on a DHT Building block for more complex applications

Page 30: DHT* Applications Jeffrey Pang CMU NetTalk, Dec. 5, 2003 * and DOLR

Jeffrey Pang, Carnegie Mellon, NetTalk 30Dec. 5, 2003

SplitStream Target:

P2P streaming / bulk file transfer Uses:

Pastry Basic Idea:

Split content into k stripes Construct k interior-node disjoint Scribe trees Distribute one stripe per tree Receivers choose number of stripes to receive (e.g.,

trade off quality for inbound capacity) Limit out-degree of nodes with join-heuristics (later)

Importance: All nodes share in forwarding of data (w.h.p.) Nifty use of Pastry ids to construct forest (next slide)

Page 31: DHT* Applications Jeffrey Pang CMU NetTalk, Dec. 5, 2003 * and DOLR

Jeffrey Pang, Carnegie Mellon, NetTalk 31Dec. 5, 2003

SplitStream Forest Construction

Notice that all interior nodes must have the same first digit in their node id Pastry routing: first hop will match first digit

Source sends stripes to k different trees Root trees at nodes with different first digits

If each digit is b bits, make k = 2b stripes Each node will be interior node of at most one tree (the

tree that matches their first digit)

Page 32: DHT* Applications Jeffrey Pang CMU NetTalk, Dec. 5, 2003 * and DOLR

Jeffrey Pang, Carnegie Mellon, NetTalk 32Dec. 5, 2003

SplitStream Limiting Out-Degree

If too many children, kick one out First, orphaned child tries “push-down”

Can I join a sibling? And continue recursively on sibling’s children

Second, use the spare capacity group Independent scribe multicast tree Composed of nodes that have spare capacity Orphan anycasts message to this group Receiver of anycast starts DFS of spare

capacity tree until it finds a node that has the desired stripe

Orphan joins that node If in-degree = k, this never fails*

Page 33: DHT* Applications Jeffrey Pang CMU NetTalk, Dec. 5, 2003 * and DOLR

Jeffrey Pang, Carnegie Mellon, NetTalk 33Dec. 5, 2003

SplitStream Overhead

Forest Construction

Control MessageOverhead underHigh Churn

Page 34: DHT* Applications Jeffrey Pang CMU NetTalk, Dec. 5, 2003 * and DOLR

Jeffrey Pang, Carnegie Mellon, NetTalk 34Dec. 5, 2003

i3 - Internet Indirection Infrastructure

Target: Rendezvous-based communication (IP indirection)

Uses: Chord

Basic Idea: Receivers insert triggers (id, receiver_id) into DHT Senders send to id, meet at triggers, which send to

receivers Supports:

Mobility: reinsert your trigger when you move Multicast & anycast: use longest-prefix match on ids to

build tree Service composition: use stacks of triggers, which act like

source routing in IP Importance:

Very low level, best-effort service built on DHT

Page 35: DHT* Applications Jeffrey Pang CMU NetTalk, Dec. 5, 2003 * and DOLR

Jeffrey Pang, Carnegie Mellon, NetTalk 35Dec. 5, 2003

Overview of Talk

Review of DHTs DHT vs DOLR Storage Multicast Database Misc. API and Infrastructure Proposals

Page 36: DHT* Applications Jeffrey Pang CMU NetTalk, Dec. 5, 2003 * and DOLR

Jeffrey Pang, Carnegie Mellon, NetTalk 36Dec. 5, 2003

PIER [Huebsch, et al., VLDB03] Target:

in situ distributed querying (e.g., network monitoring) Uses:

CAN as DHT Basic Idea:

Tables named by (namespace, resourceID); e.g., (application, primary_key)

Store tables in DHT keyed by this pair Lookup tuples by routing to a table(s)’ key and having

the end nodes do an lscan for you Join NR and NS by creating a new namespace NQ in DHT

and rehashing tuples to NQ which determines matches Importance:

Another simple multicast on a DHT Building block for more complex applications

Page 37: DHT* Applications Jeffrey Pang CMU NetTalk, Dec. 5, 2003 * and DOLR

Jeffrey Pang, Carnegie Mellon, NetTalk 37Dec. 5, 2003

PIER Performance

Page 38: DHT* Applications Jeffrey Pang CMU NetTalk, Dec. 5, 2003 * and DOLR

Jeffrey Pang, Carnegie Mellon, NetTalk 38Dec. 5, 2003

Overview of Talk

Review of DHTs DHT vs DOLR Storage Multicast Database Misc. API and Infrastructure Proposals

Page 39: DHT* Applications Jeffrey Pang CMU NetTalk, Dec. 5, 2003 * and DOLR

Jeffrey Pang, Carnegie Mellon, NetTalk 39Dec. 5, 2003

Misc. Applications

POST [Mislove, et al., HotOS03] Collaborative Applications

Approximate Object Location [Zhou, et al. Middleware03] Collaborative Spam Filtering

Page 40: DHT* Applications Jeffrey Pang CMU NetTalk, Dec. 5, 2003 * and DOLR

Jeffrey Pang, Carnegie Mellon, NetTalk 40Dec. 5, 2003

POST Target:

Toolbox for collaborative apps (e.g., email, IM, etc.) Uses:

Pastry as DHT Basic Idea:

Use PAST as storage substrate Use Scribe as notification system Assume certificate authority for assigning user IDs, keys Example: Email

Insert new mail into PAST (encrypted) Notify recipient using Scribe (delegate if not online)

Importance: Use second level systems as substrate for more complex

applications (see also OceanStore: email, nfs, web cache)

Page 41: DHT* Applications Jeffrey Pang CMU NetTalk, Dec. 5, 2003 * and DOLR

Jeffrey Pang, Carnegie Mellon, NetTalk 41Dec. 5, 2003

Approximate Object Location Target:

Collaborative filtering (e.g., Spam detection) Uses:

Tapestry as DOLR Basic Idea:

Calculate checksums of all strings of length L in message. Select N of them deterministically (“feature” vector)

Two messages match if enough features match To mark spam, insert my node into Tapestry keyed by

each feature To detect spam, lookup its features. Will get back a set of

nodes that marked each feature as spam (“votes”). Importance:

Scary, but looking more and more useful. E.G., recent DoS attacks on RBLs.

They have a plug-in for Outlook that works

Page 42: DHT* Applications Jeffrey Pang CMU NetTalk, Dec. 5, 2003 * and DOLR

Jeffrey Pang, Carnegie Mellon, NetTalk 42Dec. 5, 2003

Overview of Talk

Review of DHTs DHT vs DOLR Storage Multicast Database Misc. API and Infrastructure Proposals

Page 43: DHT* Applications Jeffrey Pang CMU NetTalk, Dec. 5, 2003 * and DOLR

Jeffrey Pang, Carnegie Mellon, NetTalk 43Dec. 5, 2003

API & Infrastructure Proposals

One Ring to Rule them All [Castro, et al. SIGOPS02] Bootstrapping multiple overlays

Common P2P API [Dabek, et al. IPTPS02] DHT/DOLR as a library

OpenHash [Karp, et al. IPTPS04*] DHT as a service


Page 44: DHT* Applications Jeffrey Pang CMU NetTalk, Dec. 5, 2003 * and DOLR

Jeffrey Pang, Carnegie Mellon, NetTalk 44Dec. 5, 2003

One Ring to Rule them All Goal:

Bootstrap multiple overlays Basic Idea:

Everyone joins a “universal” Pastry ring This ring implements PAST, Scribe, and distributed

search (see Harren, et al., IPTPS02) Advertise your overlay service in the search engine Store your code and certificates in PAST Upgrades disseminated through Scribe

Importance: How one might use an overlay to manage overlays Interesting title for a Microsoft paper :)

Page 45: DHT* Applications Jeffrey Pang CMU NetTalk, Dec. 5, 2003 * and DOLR

Jeffrey Pang, Carnegie Mellon, NetTalk 45Dec. 5, 2003

Common P2P API Goal:

Common API for structured overlays Basic Idea:

First, described a common layer that both DHT and DOLR could be implemented on

Second, looked at applications developed so far See what abstractions can be derived Described what DHT “library” functions might be

Importance: How much has to be exposed to application developers? Any DHT App can be implemented on any DHT

Page 46: DHT* Applications Jeffrey Pang CMU NetTalk, Dec. 5, 2003 * and DOLR

Jeffrey Pang, Carnegie Mellon, NetTalk 46Dec. 5, 2003

Common P2P API Classification

Page 47: DHT* Applications Jeffrey Pang CMU NetTalk, Dec. 5, 2003 * and DOLR

Jeffrey Pang, Carnegie Mellon, NetTalk 47Dec. 5, 2003

Common P2P API: API

void route(key,msg,nodeHint) void forward(key,msg,nextHop) void deliver(key,msg) node[] localLookup(key,num,safeFlag) node[] neighborSet(num) node[] replicaSet(key,maxRank) void update(node,joinedFlag) bool range(node,rank,keyRange)

Page 48: DHT* Applications Jeffrey Pang CMU NetTalk, Dec. 5, 2003 * and DOLR

Jeffrey Pang, Carnegie Mellon, NetTalk 48Dec. 5, 2003

OpenHash Goal:

DHT as a single service multiple apps can use Basic Idea:

Some simple apps only require get/put. Support these “out of box”

App operations can be classified as “endpoint” operators (at root/successor) or “hop-by-hop” operators (on path to root)

Support endpoint operators App specific code lives on nodes “outside” the main DHT Route app specific requests only to nodes that have the app’s code

Argue that don’t need to support hop-by-hop operators Most functionality can be achieved another way

Importance: How one might deploy DHT as an active service Allow people other than academics to deploy these apps?

Page 49: DHT* Applications Jeffrey Pang CMU NetTalk, Dec. 5, 2003 * and DOLR

Jeffrey Pang, Carnegie Mellon, NetTalk 49Dec. 5, 2003

OpenHash ReDir Algorithm

rendezvous points for X find successor for k

Page 50: DHT* Applications Jeffrey Pang CMU NetTalk, Dec. 5, 2003 * and DOLR

Jeffrey Pang, Carnegie Mellon, NetTalk 50Dec. 5, 2003

Conclusion DHT Apps not going away

Are they still struggling to find a purpose? Would any of these apps be better off not on

top of a DHT? Using basic apps to build more complex

ones: CFS, Ivy build on DHash POST, OneRing build on PAST, Scribe SplitStream builds on Scribe

Starting to notice that no one besides researchers using DHTs 3+ years of research... How to make them useful to real people?