implementing declarative overlays boon thau loo 1 tyson condie 1, joseph m. hellerstein 1,2, petros...
TRANSCRIPT
Implementing Declarative Overlays
Boon Thau Loo1
Tyson Condie1, Joseph M. Hellerstein1,2, Petros Maniatis2, Timothy Roscoe2, Ion Stoica1
1University of California at Berkeley, 2Intel Research Berkeley
P2
Overlays Everywhere…
Overlay networks are widely used today: Routing and forwarding component of large-
scale distributed systems Provide new functionality over existing
infrastructure
Many examples, variety of requirements: Packet delivery: Multicast, RON Content delivery: CDNs, P2P file sharing, DHTs Enterprise systems: MS Exchange
Overlay networks are an integral part of many large-scale distributed systems.
Overlay networks are an integral part of many large-scale distributed systems.
Problem
Non-trivial to design, build and deploy an overlay correctly: Iterative design process:
Desired properties Distributed algorithms and protocols Simulation Implementation Deployment Repeat…
Each iteration takes significant time and utilizes a variety of expertise
The Goal of P2
Make overlay development more accessible: Focus on algorithms and protocol designs, not
the implementation
Tool for rapid prototyping of new overlays: Specify overlay network at a high level Automatically translate specification to protocol Provide execution engine for protocol
Aim for “good enough” performance Focus on accelerating the iterative design
process Can always hand-tune implementation later
Outline
Overview of P2Architecture By Example Data Model Dataflow framework Query Language
ChordAdditional Benefits Overlay Introspection Automatic Optimizations
Conclusion
Traditional Overlay Node
node
Network State
route ...
Traditional Overlay Node
node
Network State
route ...
Overlay Program Packets OutPackets In
node
Local Tables
route ...
P2 Overlay Node
...
Netw
ork In Dataflow
...
...
Netw
ork Out D
ataflow
Overlay description: dataflow scripting language
Runtime dataflows maintain network
state
Overlay description: declarative query language
Planner
P2 Query Processor
Packets OutPackets InOverlay Program
Advantages of the P2 Approach
Declarative Query Language Concise/high level expression Statically checkable (termination,
correctness)
Ease of modification Unifying framework for introspection and implementation Automatic optimizations Query and dataflow level
Data Model
Relational data: relational tables and tuples Two kinds of tables: Stored, soft state:
E.g. neighbor(Src,Dst), forward(Src,Dst,NxtHop) Transient streams:
Network messages: message (Rcvr, Dst) Local timer-based events: periodic (NodeID,10)
...
Netw
ork In Dataflow
...
...
Netw
ork Out D
ataflow
node
Local Tables
route ...
Dataflow framework
Dataflow graph C++ dataflow elements
Similar to Click: Flow elements (mux, demux, queues) Network elements (cc, retry, rate
limitation)
In addition: Relational operators (joins, selections,
projections, aggregation)
...
Netw
ork In Data
flow
...
...
Netw
ork Out D
ataflow
node
Local Tables
route ...
Outline
Overview of P2Architecture By Example Data Model Dataflow framework Query Language
Chord in P2 Additional Benefits Overlay Introspection Automatic Optimizations
Conclusion
Simple ring
routing example
Example: Ring Routing
3
28
15
1840
60
58 13
37
0
56
42
222433
Each node has an address and
an identifier
Each object has an
identifier.
Every node
knows its successor
Objects “served” by successor
3
28
15
1840
60
58 13
37
Ring State
node(IP40,40) succ(IP40,58,IP58)
node(IP58,58) succ(IP58,60,IP60)
Stored tables: node(NAddr, N) succ(NAddr, Succ, SAddr)
Example: Ring lookup
3
28
15
1840
60
58 13
37
0
56
42
222433
Find the responsible node for a given key k?
n.lookup(k)if k in (n, n.successor)
return n.successor.addr
elsereturn n.successor.
lookup(k)
lookup(IP40,IP37,59)
Ring Lookup Events
3
28
15
1840
60
58 13
37
node(IP40,40) succ(IP40,58,IP58)
Event streams: lookup(Addr, Req, K) response(Addr, K, Owner)
lookup(IP37,IP37,59)
response(IP37,59,IP60)
lookup(IP58,IP37,59)
node(IP58,58) succ(IP58,60,IP60)
n.lookup(k)
if k in (n, n.successor) return n.successor.addrelse return n.successor. lookup(k)
Strand 1
Strand 2
...
...
...
node
Local Tables
succ ...
Netw
ork In Dataflow
Netw
ork Out D
ataflow
Pseudocode Dataflow “Strands”
Pseudocode:n.lookup(k)
if k in (n, n.successor) return n.successor.addrelse return n.successor. lookup(k)
Dataflow Strand
Event Strea
m
Action
s
Element1
Element2
Elementn
Event: Incoming network messages, periodic timers
…
Strand 1
Strand 2
...
Netw
ork In
Da
taflow
...
...
Netw
ork Ou
t Da
taflow
node
Local Tables
succ ...
Strand Elements
Condition: Process event using strand elements
Action: Outgoing network messages, local table updates
Pseudocode Strand 1n.lookup(k)
if k in (n, n.successor) return n.successor.addrelse return n.successor.lookup(k)
RECEIVE lookup(NAddr, Req, K)
Stored tables node(NAddr, N) succ(NAddr, Succ, SAddr)
Event streams lookup(Addr, Req, K) response(Addr, K, Owner)
node(NAddr, N) & succ(NAddr, Succ, SAddr) & K in (N, Succ]
SEND response(Req, K, SAddr) to Req
Event:
Condition:
Action:
Strand 1
Strand 2
...
Netw
ork In D
ataflow
...
...
Netw
ork Out D
ataflow
node
Local Tables
succ ...
Pseudocode to Strand 1Event: RECEIVE lookup(NAddr, Req, K)
Condition: node(NAddr, N) & succ(NAddr, Succ, SAddr) & K in (N, Succ]
Action: SEND response(Req, K, SAddr) to Req
Matchlookup.Add
r = node.Addr
Matchlookup.Add
r = succ.Addr
lookup FilterK in
(N,Succ)
FormatResponse(Req,K,SAddr)
Response
n.lookup(k)if k in (n, n.successor)
return n.successor.addrelse
return n.successor. lookup(k)
succnode
Strand 1
Strand 2
...
Netw
ork In D
ataflow
...
...
Netw
ork Out D
ataflow
node
Local Tables
succ ...
Join Join Select
Project
Dataflow strand
Pseudocode to Strand 2Event: RECEIVE lookup(NAddr, Req, K)
Condition: node(NAddr, N) & succ(NAddr, Succ, SAddr)
Joinlookup.Add
r = node.Addr
Joinlookup.Add
r = succ.Addr
SelectK not in (N,Succ)
& K not in (N, Succ]
lookup
Action: SEND lookup(SAddr, Req, K) to SAddr
Projectlookup(SAdd
r,Req,K)
lookup
n.lookup(k)if k in (n, n.successor)
return n.successor.addrelse
return n.successor. lookup(k)
node succ
Strand 1
Strand 2
...
Netw
ork In D
ataflow
...
...
Netw
ork Out D
ataflow
node
Local Tables
succ ...
Dataflow strand
Strand Execution
Strand 1
Strand 2
...
...
...
node
Local Tables
succ ...
Netw
ork In Dataflow
Netw
ork Out D
ataflow
lookuplookup/
response
lookup
lookup
lookup
response
Actual Chord Lookup Dataflow
L1 Joinlookup.NI ==
node.NI
Joinlookup.NI ==bestSucc.NI
TimedPullPush 0
SelectK in (N, S]
ProjectlookupRes
MaterializationsInsert
Insert
Insert
L3TimedPullPush
0
JoinbestLookupDist.NI
== node.NI
L2TimedPullPush
0
node
Demux(@local?)
Tim
edPullP
ush0
Network OutQueueremote
local
Netw
ork In
best
Look
upD
ist
finge
rbe
stS
ucc
bestSucc
look
up
Mux
Tim
edPullP
ush 0
Queue
Dup
finger
node
RoundR
obin
Dem
ux(tuple nam
e)
Agg min<D>on finger
D:=K-B-1, B in (N,K)
Agg min<BI>on finger
D==K-B-1, B in (N,K)
Joinlookup.NI ==
node.NI
Query Language: Overlog
“SQL” equivalent for overlay networksBased on Datalog: Declarative recursive query language Well-suited for querying properties of
graphs Well-studied in database literature
Static analysis, optimizations, etc
Extensions: Data distribution, asynchronous messaging,
periodic timers and state modification
Query Language: Overlog
Datalog rule syntax: <head> <condition1>, <condition2>, … , <conditionN>.
Overlog rule syntax: <Action> <event>, <condition1>, … , <conditionN>.
Query Language: Overlog
Event: RECEIVE lookup(NAddr, Req, K)
Condition: lookup(NAddr, Req, K) & node(NAddr, N) & succ(NAddr, Succ, SAddr) & K in (N, Succ]
Action: SEND response(Req, K, SAddr) to Req
response@Req(Req, K, SAddr)
lookup@NAddr(Naddr, Req, K),node@NAddr(NAddr, N),
succ@NAddr(NAddr, Succ, SAddr), K in (N,Succ].
Overlog rule syntax: <Action> <event>, <condition1>, … , <conditionN>.
P2-Chord
Chord Routing, including: Multiple successors Stabilization Optimized finger
maintenance Failure recovery
47 OverLog rules13 table definitionsOther examples:
Narada, flooding, routing protocols
10 pt font
Performance Validation
Experimental Setup: 100 nodes on Emulab testbed 500 P2-Chord nodes
Main goals: Validate expected network properties
Sanity Checks
Logarithmic diameter and state (“correct”)BW-efficient: 300 bytes/s/node
Churn Performance
Metric: Consistency [Rhea at al]P2-Chord: P2-Chord@64mins: 97% consistency P2-Chord@16mins: 84% consistency P2-Chord@8min: 42% consistency
Hand-crafted Chord: MIT-Chord@47mins: 99.9% consistency
Outperforms P2 under higher churn
Not intended to replace a carefully hand-crafted Chord
Benefits of P2
Introspection with QueriesAutomatic optimizationsReconfigurable Transport (WIP)
Introspection with Queries
Unifying framework for debugging and implementation Same query language, same platform
Execution tracing/logging Rule and dataflow level Log entries stored as tuples and queried
Correctness invariants, regression tests as queries: “Is the Chord ring well formed?” (3 rules) “What is the network diameter?” (5 rules) “Is Chord routing consistent?” (11 rules)
With Atul Singh (Rice) and Peter Druschel (MPI)
Automatic OptimizationsApplication of traditional Datalog optimizations to network routing protocols (SIGCOMM 2005)Multi-query sharing:
Common “subexpression” elimination Caching and reuse of previously computed results Opportunistically share message propagation across
rules Join
lookup.Addr =
node.Addr
Joinlookup.Ad
dr = succ.Addr
lookup SelectK not in (N,Succ)
Projectlookup(SAddr
,Req,K)
lookup
ProjectResponse(Req,K,SAddr)
SelectK in
(N,Succ)
responseJoinlookup.Add
r = node.Addr
lookupJoin
lookup.Addr =
succ.Addr
Automatic Optimizations
Cost-based optimizations Join ordering affects performance
Joinlookup.Add
r = node.Addr
Joinlookup.Add
r = succ.Addr
SelectK not in (N,Succ)
lookup Projectlookup(SAddr
,Req,K)
lookup
ProjectResponse(Req
,K,SAddr)
SelectK in
(N,Succ)
response
Joinlookup.Add
r = node.Addr
Joinlookup.Add
r = succ.Addr
Open Questions
The role of rapid prototyping?How good is “good enough” performance for rapid prototypes?When do developers move from rapid prototypes to hand-crafted code?Can we get achieve “production quality” overlays from P2?
Future Work
“Right” languageFormal data and query semantics Static analysis Optimizations Termination Correctness
Conclusion
P2: Declarative Overlays Tool for rapid prototyping new overlay
networks
Declarative Networks Research agenda: Specify and
construct networks declaratively Declarative Routing : Extensible
Routing with Declarative Queries (SIGCOMM 2005)
Thank You
http://p2.cs.berkeley.edu
P2
Latency CDF for P2-Chord
Median and average latency around 1s.