beehive: achieving o(1) lookup performance in p2p overlays for zipf-like query distributions...
TRANSCRIPT
Beehive: Achieving O(1) Lookup Performance in P2P Overlays for Zipf-like Query Distributions
Venugopalan Ramasubramanian (Rama)
andEmin Gün SirerCornell University
introduction
caching is widely-used to improve latency and to decrease overheadpassive caching caches distributed throughout the
network store objects that are encountered
not well-suited for a large-class applications
problems with passive caching
no performance guaranteesheavy-tail effect large percentage of queries to unpopular
objects ad-hoc heuristics for cache management
introduces coherency problems difficult to locate all copies weak consistency model
overview of beehive
general replication framework for structured DHTs decentralization, self-organization, resilience
properties high performance: O(1) average lookup time scalable: minimize number of replicas and
reduce storage, bandwidth, and network load
adaptive: promptly respond to changes in popularity – flash crowds
key intuition
tunable latency adjust number of
objects replicated at each level
fundamental space-time tradeoff
2012
0021
0112
0122
analytical model
optimization problemminimize: total number of replicas, s.t.,
average lookup performance C
configurable target lookup performance continuous range, sub one-hop
minimizing number of replicas decreases storage and bandwidth overhead
analytical model
zipf-like query distributions with parameter number of queries to rth popular object 1/r
fraction of queries for m most popular objects
(m1- - 1) / (M1- - 1)
level of replication nodes share i prefix-digits with the object i hop lookup latency replicated on N/bi nodes
optimization problem
minimize (storage/bandwidth)x0 + x1/b + x2/b2 + … + xK-1/bK-1
such that (average lookup time is C hops)K – (x0
1- + x11- + x2
1- + … + xK-11-) C
andx0 x1 x2 … xK-1 1
b: base K: logb(N)
xi: fraction of objects replicated at level i
optimal closed-form solution
dj (K’ – C)
1 + d + … + dK’-1
11 - [ ]
x*i =, 0 i K’ – 1
where, d = b(1- ) /
, K’ i K
K’ is determined by setting (typically 2 or 3)x*K’-1 1 dK’-1 (K’ – C) / (1 + d + … + dK’-1)
1
1
beehive: system overview
estimation popularity of objects, zipf parameter local measurement, limited aggregation
replication apply analytical model independently at
each node push new replicas to nodes at most one hop
away
beehive replication protocol
0 1 2 *
home node
EL 3
0 1 *0 1 * 0 1 *EB IL 2
0 *0 *0 * 0 * 0 * 0 * 0 * 0 * 0 *
A B C D E F G H I
L 1
object 0121
mutable objects
leverage the underlying structure of DHT replication level indicates the locations of all
the replicas
proactive propagation to all nodes from the home node home node sends to one-hop neighbors with
i matching prefix-digits level i nodes send to level i+1 nodes
implementation and evaluation
implemented using Pastry as the underlying DHT evaluation using a real DNS workload
MIT DNS trace (zipf parameter 0.91) 1024 nodes, 40960 objects compared with passive caching on pastry
main properties evaluated lookup performance storage and bandwidth overhead adaptation to changes in query distribution
evaluation: lookup performance
passive caching is not very effective because
of heavy tail query distribution and mutable objects.
beehive converges to the target of 1 hop
evaluation: overheadBandwidth Storage
average number of replicas per nodePastry 40
PC-Pastry 420
Beehive 380
Cooperative Domain Name System (CoDoNS)
replacement for legacy DNS secure authentication through
DNSSEC
incremental deployment path completely transparent to clients uses legacy DNS to populate resource
records on demand
deployed on planet-lab
advantages of CoDoNS
higher performance than legacy DNS median latency of 7 ms for codons
(planet-lab), 39 ms for legacy DNS
resilience against denial of service attacks self configuration after host and
network failures
fast update propagation
conclusions
model-driven proactive caching O(1) lookup performance with optimal replicas
beehive: a general replication framework structured overlays with uniform fan-out high performance, resilience, improved
availability
well-suited for latency sensitive applications
www.cs.cornell.edu/people/egs/beehive
typical values of zipf parameter
MIT DNS trace: = 0.91Web traces:
trace Dec UPisa FuNet UCB Quest NLANR
0.83 0.84 0.84 0.83 0.88 0.90
comparative overview of structured DHTs
DHTlookup
performance
CAN O(dN1/d)
Chord, Kademlia, Pastry, Tapestry, Viceroy
O(logN)
de Bruijn graphs (Koorde) O(logN/loglogN)
Kelips, Salad, [Gupta, Liskov, Rodriguez], [Mizrak, Cheng,
Kumar, Savage]O(1)
O(1) structured DHTs
DHTlookup
performancerouting state
Salad d O(dN1/d)[Mizrak, Cheng, Kumar, Savage]
2 N
Kelips 1N(N
replication)[Gupta, Liskov,
Rodriguez]1 N
security issues in beehive
underlying DHT corruption in routing tables [Castro, Druschel, Ganesh, Rowstrom,
Wallach]
beehive misrepresentation of popularity remove outliers
application corruption of data certificates (ex. DNS-SEC)