paper survey of dht distributed hash table. usages directory service very little amount of...

Paper Survey of DHT

Distributed Hash Table

Usages

Directory service Very little amount of information, such as URI,

metadata, … Storage

Data, such as files, … Immutable, just for download

Database Each entry is small, but large amount of entries Mutable Special operations for query

Challenges

Immutable Latency Availability Query Consistency

Mutable Object Consistency

Latency Query

Different routing architectures Chord, Tapestry, Pastry, Kademlia, Can, …

Recursive, interactive Proximity Neighbor Route Parallel Routing table size

Fetch Transport Protocol Proximity Neighbor Selections Cache Distributed Object

Query: Routing Architectures

Routing Complexity O (log n), O (d), O (1), …

Principle Each peer has a unique digest Object with a digest Put the object to the peer with the closed digest

Famous ones are O (log n) O (1)

cache

Query: Recursive or Interactive Query is recursive forward

Faster 2 times than interactive theoretically Primary parameters

Base # of successor

Persistent problem

Query: Recursive or Interactive Query is interactively forward

Not very slow in practical Primary parameters

# of parallel query Routing table tree

Learning new neighbor easily Exchange information with other peers Flexible

Query: Proximity Neighbor Route Route by a node with smaller delay Small delay -> small timeout

TCP > Vivaldi > fixed

Query: Proximity Neighbor Route Measure methods

Global Sampling Neighbor’s neighbors Neighbor’s inverse Recursive sampling

Query: others

Parallel query Faster With partial PNS property Persistent More traffic

Large routing table Easy to find a closer node locally

Fetch: Cache

Cache objects on nodes closer to the primary one

# of nodes to cache is upon the popularity of the object

Average query hops can be reduced to a constant number ( O (1) )

Hard to apply to mutable object Consider churn more bandwidth

consumption

Fetch: Distributed Object

Split object to small pieces and put on different nodes

Recover faster Download faster Hard to maintain Only for immutable data

Fetch: Transport Protocol

Striped Transport Protocol UDP Window control Retransmission

Availability

Replicate Reactive / Proactive Eager / lazy repair

Erasure coding

Load balance is broken High correlation between uptime and storage

Maintenance traffic problem

Availability: Replicate

Reactive Duplicate when a copy is lost Consume lots of bandwidth in short time When churn is low, reactive is better

Proactive Duplicate continually Consume constant and small bandwidth continually Need avail. prediction and redundancy management Bandwidth usage is predictable

Availability: Replicate Temporary / Permanent churn Availability <-> Durability Achieve 100% availability or/and durability ? Eager repair

Duplicate immediately

Lazy repair Duplicate after timeout Need a good choice of timeout Reintegrating returning replicas

Availability: Erasure Coding

Matter more on larger object Save storage and bandwidth For high churn, the bandwidth consumption is

still not acceptable Complex maintenance Download latency is heterogeneous Only for immutable data

Query Consistency

A digest-object mapping is existed, then the result of query must be it

Weakly consistent KBR Eventual consistency Most of existed DHT

Strongly consistent KBR Causality consistency Strong consistency

Solution Route by W-KBR to a group S-KBR in a group

Mutable DHT

Object stored in DHT is mutable Insert, update, delete

Churn -> Replica New Challenge …

Object Consistency

For immutable data For security issue, it may be there

Merkle tree

For mutable data Consensus algorithm

Distributed algorithm for data consistency Quorum algorithm

Read / write locks

Pitfalls

Different kinds of p2p have different properties

Lack of new real traces Standard simulation platform

References

Efficient Replica Maintenance for Distributed Storage Systems Proactive replication for data durability On object Maintenance in Peer-to-Peer systems Enforcing Routing Consistency in Structured Peer-to-peer Overla

ys: Should We and Could We? High Availability in DHTs: Erasure Coding vs. Replication Toward Fault-tolerant Atomic Data Access in Mutable Distributed

Hash Tables Kademlia: A Peer-to-peer Information System Based on the XOR

Metric Total Recall: System Support for Automated Availability Manage

ment Designing a DHT for low latency and high throughput

References

Fallacies in evaluating decentralized systems Anatomy of a P2P Content Distribution system with Network Cod

ing Comparing the performance of distributed hash tables under chur

n EpiChord: Parallelizing the Chord Lookup Algorithm with Reactiv

e Routing State management Bandwidth-efficient management of DHT routing tables Improving Lookup Performance over a Widely-Deployed DHT Failure Recovery for Structured P2P Networks: Protocol Design

and Performance Evaluation Handling Churn in a DHT

paper survey of dht distributed hash table. usages directory service very little amount of...

Documents