Minimal Broker Overlay Design for Content-Based Publish/Subscribe
Systems
Naweed Tajuddin
Balasubramaneyam ManiymaranHans-Arno Jacobsen
University of TorontoNovember 18, 2013, CASCON 2013
MSRG.ORG
2MIDDLEWARE SYSTEMS
RESEARCH GROUPMSRG.ORG
Nov. 18, 2013 2
Introduction to Publish/Subscribe
• Messaging platform that decouples information sources and sinks
• GooPS – Google P/S– AdSense, Docs, YouTube
• Yahoo Message Broker (within PNUTs)– Data replication for Web Apps– Eventual consistency
3MIDDLEWARE SYSTEMS
RESEARCH GROUPMSRG.ORG
Nov. 18, 2013
1. Advertise
2. Subscribe
3. Publish
Content-Based Publish/Subscribe
S
P
4MIDDLEWARE SYSTEMS
RESEARCH GROUPMSRG.ORG
Nov. 18, 2013
Challenges for Content-Based Publish/Subscribe
B5 B6
B2B3
B7
S1
B8 B9
P1
P2
B1
B4
Let me know when HP book <
$15
Let me know when HP book <
$15
Sale! HP: $14.99
Sale! MW: $15.99
Sale! OED: $24.99
Amazon.ca
Chapters.ca
5MIDDLEWARE SYSTEMS
RESEARCH GROUPMSRG.ORG
Nov. 18, 2013
Problem Statement
Given a set of publishers and subscribers, how can we design a pub/sub overlay that maximizes performance (delivery latency) and minimizes cost (number of brokers).
INPUT
• Brokers available for deployment (processing capacities)
• Publishers (advertisements)
• Subscribers (subscriptions)
• Publication rates per advertisement
OUTPUT
• Set of deployed brokers
• Client-broker allocation
• Overlay topology
CONSTRAINT
• Broker processing capacities not exceeded
6MIDDLEWARE SYSTEMS
RESEARCH GROUPMSRG.ORG
Nov. 18, 2013
Content Space
priceprice
volu
me
volu
me
2 4 6 8 10 12
2
4
6
8
sub: [price in (2,10)][volume in (2,7)]
pub: [price = 3][volume = 5]
7MIDDLEWARE SYSTEMS
RESEARCH GROUPMSRG.ORG
Nov. 18, 2013
Similarity Model: Interest
• Publisher-subscriber similarity
• Likelihood that a publication will match a subscription
• Geometric intersection between advertisement and subscription over advertisement size
• I = α12 / |a1|
α12
a1
s2
12
7
4
2
2 5 10 12att1att1
att2
att2
8MIDDLEWARE SYSTEMS
RESEARCH GROUPMSRG.ORG
Nov. 18, 2013
Similarity Model: Commonality
• Subscriber-subscriber similarity
• Likelihood that publications matching one subscription will match another subscription
• Geometric intersection over subscription size
• C = α122 / |s1||s2|
α12
s1
s2
12
7
4
2
2 5 10 12att1att1
att2
att2
9MIDDLEWARE SYSTEMS
RESEARCH GROUPMSRG.ORG
Estimating Load Impact
• Publishers– Publication rate
• Subscriber– Σpub(Interest * pub rate)
• Brokers– Sum of load impact of local publishers and
subscribers– Load compensation factor: reserve broker capacity to
account for pure forwarding traffic
S1
P1
P2
10 msgs/s8 msgs/s
10MIDDLEWARE SYSTEMS
RESEARCH GROUPMSRG.ORG
Nov. 18, 2013
Solution Overview
• Two-phase Algorithm1. Allocate clients across minimal set of
brokers
2. Cluster brokers with high similarity to form good overlay topology
• Both problems are NP-complete, see paper for proof
11MIDDLEWARE SYSTEMS
RESEARCH GROUPMSRG.ORG
Nov. 18, 2013
B2
B1
B3
B4
B5
Client 1
Client 2
Client 3
Client 4
Client 5
Brokers Ranked by Capacity
Clients Ranked by Client Ranking Function
B1
Client 1
Client 2
Most similar broker
Client 3
B2
Client 3
Client 4
Client 4
B1
Client 4
Client Allocation
Client-Broker Allocation Algorithm
12MIDDLEWARE SYSTEMS
RESEARCH GROUPMSRG.ORG
Nov. 18, 2013
Client Ranking Function
• Determines the order in which clients are deployed– Impacts how broker capacities are consumed
• #1 Greatest load impact (GLI)– Clients ranked by load imposed on broker
• #2 Greatest interest (GI)– Groups consisting of single publisher-subscriber-pair ranked by
interest • #3 Greatest interest per group (GIg)
– Groups consisting of single publisher and all subscribers with non-zero interest, ranked by greatest interest
• #4 Baseline– Clients ranked and allocated in random order
13MIDDLEWARE SYSTEMS
RESEARCH GROUPMSRG.ORG
Nov. 18, 2013
Overlay Topology Construction
1. Assign weight to every link equal to commonality of broker pair
2. Compute max spanning tree to get overlay topology
a1 a2
a3 a4
s4 s5
s6 s7
s3
s1 s2
A B
C D
14MIDDLEWARE SYSTEMS
RESEARCH GROUPMSRG.ORG
Nov. 18, 2013
Evaluation Overview / Experiment Steps
• Algorithms implemented in Java and simulated using JiST discrete event simulator
1. Execute algorithms to compute overlay design2. Configure pub/sub system simulator overlay
according to overlay design3. Run experiment and record statistics
15MIDDLEWARE SYSTEMS
RESEARCH GROUPMSRG.ORG
Nov. 18, 2013
Overlay Performance
Message count: number of messages generated per publication (or number of broker hops publication must travel to reach all interested subscribers)
16MIDDLEWARE SYSTEMS
RESEARCH GROUPMSRG.ORG
Nov. 18, 2013
Conclusions
• Optimization framework for pub/sub overlay construction
• Similarity framework: Interest and commonality – Tools for overlay construction– Leverage existing semantics of content-based pub/sub
• Load modeling framework– Broker congestion significantly reduced
• Client allocation and overlay topology construction algorithms– Low latency overlay topologies at reduced cost
• Future work– Support additional constraints and incorporate network congestion– Account for physical network and broker capacity model
Thank You!Questions?
** Extra Slides **
19MIDDLEWARE SYSTEMS
RESEARCH GROUPMSRG.ORG
Nov. 18, 2013 19
Similarity Model: Commonality
• Subscriber-subscriber similarity• Likelihood that publications matching one subscription
will match another subscription• Geometric intersection over subscription size
• C = α122 / |s1||s2|
7α34
s3
s4
α12
s1
s2
12
7
4
2
2 5 10 12 2 3 8 9
8
54
att1att1
att2
att2
att1att1
att2
att2
class = “sale”class = “sale” class = “sale”class = “sale”
20MIDDLEWARE SYSTEMS
RESEARCH GROUPMSRG.ORG
Nov. 18, 2013
Evaluation Overview
• Algorithms implemented in Java
• Pub/Sub system built using JiST discrete event simulator
• Workload details– 200-1000 publishers– 600-3000 subscribers– 12000-48000 pubs– Pub rates: 1-10 msgs/s– Broker capacity: 1000
msg/s
# of Advs
# of Subs
# of Pubs
21MIDDLEWARE SYSTEMS
RESEARCH GROUPMSRG.ORG
Nov. 18, 2013
Overlay Cost
22MIDDLEWARE SYSTEMS
RESEARCH GROUPMSRG.ORG
Nov. 18, 2013 22
Maximum Peak Load
23MIDDLEWARE SYSTEMS
RESEARCH GROUPMSRG.ORG
Nov. 18, 2013
Load and Congestion Effects
24MIDDLEWARE SYSTEMS
RESEARCH GROUPMSRG.ORG
Nov. 18, 2013
LCF Cost–Performance Tradeoff
25MIDDLEWARE SYSTEMS
RESEARCH GROUPMSRG.ORG
Nov. 18, 2013 25
Overlay Design – Related Work
B5 B6
B2B3
B7
S1
B8 B9
P1
P2
B1
B4B5 B6
B2B3
B7
S1
B8
P1
P2
B1
B4
B9
• Rewire overlay[Baldoni et al., 2007][Yoon et al., 2013]
• Move publishers[Cheung et al., 2010]
26MIDDLEWARE SYSTEMS
RESEARCH GROUPMSRG.ORG
Nov. 18, 2013
Estimating Load Impact of Clients
S1
P1
P2• Publisher
– Publication Rate
• Subscriber– Σ(Interest * pub rate)
sS1
aP1 aP2
i1 i2
iS1-P1 = 0.15 iS1-P2 = 0.20
10 msgs/s8 msgs/s
Load impact (S1) = 8(0.15) + 4(0.2)
= 2 msgs/s
27MIDDLEWARE SYSTEMS
RESEARCH GROUPMSRG.ORG
Nov. 18, 2013
Estimating Broker Load
S1
P1
P2
10 msgs/s8 msgs/s
• Sum of load impact of local publishers and subscriber
• Pure forwarding traffic
• Load compensation factor: reserve broker capacity by factor for pure forwarding traffic