paper survey of dht distributed hash table. usages directory service very little amount of...
TRANSCRIPT
Paper Survey of DHT
Distributed Hash Table
Usages
Directory service Very little amount of information, such as URI,
metadata, … Storage
Data, such as files, … Immutable, just for download
Database Each entry is small, but large amount of entries Mutable Special operations for query
Challenges
Immutable Latency Availability Query Consistency
Mutable Object Consistency
Latency Query
Different routing architectures Chord, Tapestry, Pastry, Kademlia, Can, …
Recursive, interactive Proximity Neighbor Route Parallel Routing table size
Fetch Transport Protocol Proximity Neighbor Selections Cache Distributed Object
Query: Routing Architectures
Routing Complexity O (log n), O (d), O (1), …
Principle Each peer has a unique digest Object with a digest Put the object to the peer with the closed digest
Famous ones are O (log n) O (1)
cache
Query: Recursive or Interactive Query is recursive forward
Faster 2 times than interactive theoretically Primary parameters
Base # of successor
Persistent problem
Query: Recursive or Interactive Query is interactively forward
Not very slow in practical Primary parameters
# of parallel query Routing table tree
Learning new neighbor easily Exchange information with other peers Flexible
Query: Proximity Neighbor Route Route by a node with smaller delay Small delay -> small timeout
TCP > Vivaldi > fixed
Query: Proximity Neighbor Route Measure methods
Global Sampling Neighbor’s neighbors Neighbor’s inverse Recursive sampling
Query: others
Parallel query Faster With partial PNS property Persistent More traffic
Large routing table Easy to find a closer node locally
Fetch: Cache
Cache objects on nodes closer to the primary one
# of nodes to cache is upon the popularity of the object
Average query hops can be reduced to a constant number ( O (1) )
Hard to apply to mutable object Consider churn more bandwidth
consumption
Fetch: Distributed Object
Split object to small pieces and put on different nodes
Recover faster Download faster Hard to maintain Only for immutable data
Fetch: Transport Protocol
Striped Transport Protocol UDP Window control Retransmission
Availability
Replicate Reactive / Proactive Eager / lazy repair
Erasure coding
Load balance is broken High correlation between uptime and storage
Maintenance traffic problem
Availability: Replicate
Reactive Duplicate when a copy is lost Consume lots of bandwidth in short time When churn is low, reactive is better
Proactive Duplicate continually Consume constant and small bandwidth continually Need avail. prediction and redundancy management Bandwidth usage is predictable
Availability: Replicate Temporary / Permanent churn Availability <-> Durability Achieve 100% availability or/and durability ? Eager repair
Duplicate immediately
Lazy repair Duplicate after timeout Need a good choice of timeout Reintegrating returning replicas
Availability: Erasure Coding
Matter more on larger object Save storage and bandwidth For high churn, the bandwidth consumption is
still not acceptable Complex maintenance Download latency is heterogeneous Only for immutable data
Query Consistency
A digest-object mapping is existed, then the result of query must be it
Weakly consistent KBR Eventual consistency Most of existed DHT
Strongly consistent KBR Causality consistency Strong consistency
Solution Route by W-KBR to a group S-KBR in a group
Mutable DHT
Object stored in DHT is mutable Insert, update, delete
Churn -> Replica New Challenge …
Object Consistency
For immutable data For security issue, it may be there
Merkle tree
For mutable data Consensus algorithm
Distributed algorithm for data consistency Quorum algorithm
Read / write locks
Pitfalls
Different kinds of p2p have different properties
Lack of new real traces Standard simulation platform
References
Efficient Replica Maintenance for Distributed Storage Systems Proactive replication for data durability On object Maintenance in Peer-to-Peer systems Enforcing Routing Consistency in Structured Peer-to-peer Overla
ys: Should We and Could We? High Availability in DHTs: Erasure Coding vs. Replication Toward Fault-tolerant Atomic Data Access in Mutable Distributed
Hash Tables Kademlia: A Peer-to-peer Information System Based on the XOR
Metric Total Recall: System Support for Automated Availability Manage
ment Designing a DHT for low latency and high throughput
References
Fallacies in evaluating decentralized systems Anatomy of a P2P Content Distribution system with Network Cod
ing Comparing the performance of distributed hash tables under chur
n EpiChord: Parallelizing the Chord Lookup Algorithm with Reactiv
e Routing State management Bandwidth-efficient management of DHT routing tables Improving Lookup Performance over a Widely-Deployed DHT Failure Recovery for Structured P2P Networks: Protocol Design
and Performance Evaluation Handling Churn in a DHT