bluedove pub/sub - winlab · 2011-05-11 · existing pub/sub systems centralized enterprise pub/sub...

of 29/29
IBM Research © 2011 IBM Corporation BlueDove Pub/sub A Scalable and Elastic Publish/Subscribe Service Fan Ye IBM T. J. Watson Research Center In collaboration with Ming Li, Minkyong Kim, Han Chen, Hui Lei

Post on 19-Jul-2020

1 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

  • IBM Research

    © 2011 IBM Corporation

    BlueDove Pub/sub–A Scalable and Elastic Publish/Subscribe Service

    Fan YeIBM T. J. Watson Research Center

    In collaboration with Ming Li, Minkyong Kim, Han Chen, Hui Lei

  • IBM Research

    © 2011 IBM Corporation2

    Emerging Smarter Planet Applications

    Traffic Camera

    Smartphone

    Speed Sensor

    Smarter Transportation Smarter Grid

    “How is the traffic from my current location to my destination?”

    Current location &

    speed

    Requires highly efficient messaging service

  • IBM Research

    © 2011 IBM Corporation3

    Challenges to Messaging Service Large number of clients, high information rate—Scalability

    Dynamic work load—Elasticity

    Personalized information—Real time fine-grained filtering

    Non-interrupt service—High availability

  • IBM Research

    © 2011 IBM Corporation4

    General Publish/Subscribe Messaging Model

    PublishersSubscribers

    Subscription Event

    Event

    Store subscriptions

    Match events to subscriptions

    Push events to subscribers

    Decoupling of information producer/consumer Fine-grained event filtering

    Pub/Sub Server

  • IBM Research

    © 2011 IBM Corporation5

    Existing Pub/Sub Systems Centralized enterprise pub/sub

    – Multiple fully connected, dedicated cluster servers

    Peer-to-peer public pub/sub– Large number of partially-

    connected, global-distributed machines

    – Meghdoot, Pastrystring, Hermes

  • IBM Research

    © 2011 IBM Corporation6

    Centralized Enterprise Pub/Sub Support enterprise applications

    – Relatively static clients– Predictable work load– Strict requirement on response time

    Design– Centralized and relatively static servers– Fully connected topology– Subscriptions are replicated to all servers

    Challenges– Hard to provision for highly dynamic work load– Heavy administration when system scale is large

    Publisher

    Subscriber1

    Subscriber2

    Subscriber3

    Subscriber4

    Sub1Sub2Sub3Sub4

    Sub1Sub2Sub3Sub4

    Sub1Sub2Sub3Sub4

    Sub1Sub2Sub3Sub4

  • IBM Research

    © 2011 IBM Corporation7

    Peer-to-Peer Public Pub/Sub Support public applications

    – Clients join or leave frequently– Highly dynamic work load– Flexible requirements on response time and false rate

    Architecture– Publishers and subscribers also act as pub/sub servers– Servers form a self-maintained logical overlay– Subscriptions/publications are mapped to the overlay– Routing are done in multi-hop manner

    Challenges– Bottlenecks due to skewed distribution– Performance issues due to dynamics and heterogeneity of

    servers – Large network delay

    0

    10

    40

    75

    120

    90

    Subscriber

    Sub : {Speed: 30~40}

    Publisher

    event : {Speed: 35}

    (0~10]

    (10~40]

    (40~75]

    (75~90]

    (90~120]

    (120~0]Sub: {Speed: 30~40}

  • IBM Research

    © 2011 IBM Corporation8

    BlueDove Pub/Sub Cloud Centralized enterprise pub/sub

    Peer-to-peer public pub/sub

  • IBM Research

    © 2011 IBM Corporation9

    Data Center

    BlueDove Pub/Sub Cloud Architecture

    Gossip-based Single-hop Overlay

    Performance-oriented Event Forwarding

    Publisher Subscriber Publisher Subscriber

    Pub/Sub Servers

    Clients

    Multi-dimension Subscription Assignment

    Sub

    SubSub

    Event

    Event

    Event

  • IBM Research

    © 2011 IBM Corporation10

    Gossip-based Single-hop Overlay

    Each server maintains global view of the overlay– The contact information of each other server

    – Enables one-hop forwarding

    Overlay changes are propagated by gossip– Each server periodically exchanges info with a

    few random servers

    – Any state change is propagated to the whole network in log(N) rounds

    Advantages– Auto, low-overhead maintenance

    – Low network delay

    Overlay TableID IP

    2 192.168.1.2

    4 192.168.1.4

    1

    2

    3

    4

    6

    5

    Overlay TableID IP

    1 192.168.1.1

    3 192.168.1.3

    Overlay TableID IP

    2 192.168.1.2

    3 192.168.1.3

    4 192.168.1.4

    Overlay TableID IP

    1 192.168.1.1

    2 192.168.1.2

    3 192.168.1.3

  • IBM Research

    © 2011 IBM Corporation11

    Multi-dimensional Subscription Assignment Multiple dimensions are mapped to the overlay

    simultaneously– Each dimension is partitioned into sections– Each server maintains a section of each dimension– Partition information is propagated by gossip and

    known by all servers

    Each subscription is mapped in all K dimensions and stored by K servers simultaneously– Each server maintains a separate index for

    subscriptions assigned on each dimension

    Advantages– Redundancy of servers– Allows us to exploit the skewness in distribution of

    subscriptions

    Speed

    Locationx

    1

    2

    4

    3

    Speed 10~20Longitude 40o~41o

    L4L3L2L1

    S1 S2 S3 S4

    S3

    L2

    L4S1

    L1

    S2

    L3

    S4

  • IBM Research

    © 2011 IBM Corporation12

    Speed

    LocationxL4L3L2L1

    S1 S2 S3 S4

    Performance Oriented Event Forwarding Select the most efficient dimension to match events

    – Servers exchange per dimension load information through gossip– Different load metris: #subscriptions, average response

    time, cpu load, …– Each event has K candidate servers, one for each dimension– Select the server with the best performance metric

    Policy 1: minimum number of subscriptions– Does not work well in heterogeneous environment

    Policy 2: minimum average response time– May not adapt fast enough when server load changes

    quickly Policy 3: minimum predicted response time

    – Match_Time * (QLenold + T * (Event_Rate - Match_Rate))– Adapts to heterogeneous environment and fast changing

    server load Advantage

    – Exploits the skewness of subscriptions– Load balancing

    1 sub

    3 subs

    Speed 25Longitude 33o

    1

    2

    4

    3S3

    L2

    L4S1

    L1

    S2

    L3

    S4

  • IBM Research

    © 2011 IBM Corporation13

    BlueDove Pub/Sub Server

    Dispatcher

    Pub/S

    ub Server

    Implementation

    Gossiper (Cassandra)

    One-hop OverlayLoad Monitor Bootstrap

    Matching Engine

    Subscriptions

    Subscription Forward Partitioner

    Event Forward

    Dimension Select

    Server Map

    Load Collector

    Load Predictor

  • IBM Research

    © 2011 IBM Corporation14

    Testbed and Experiment Setup Testbed

    – 25 RC2 machines • 20 pub/sub servers; 3 dispatchers; 2 to simulate clients

    Simulators simulate publishers and subscribers– Each subscription or publication has four attributes– Each attribute in subscriptions represents a range of 200

    • The center of the range follow a Normal distribution cropped of range size 1000, standard deviation of 250, mean at 125, 375, 625, 875 for dimension 1-4

    – Each attribute in events represents a point in [0,1000]• The point follows a uniform random distribution

    Metrics– Response time: the time period from when a publication is generated to when it

    is received by corresponding subscribers – Saturation event rate: the highest event rate that the system can handle

    without publication queue overflow

  • IBM Research

    © 2011 IBM Corporation15

    Response time behavior to event rate Response time series before saturation: 40K subscriptions, 20 servers, 100K event/sec

    0

    5000

    10000

    15000

    20000

    0 200000 400000 600000 800000 1000000

    Time(milli-seconds)

    Res

    pons

    e Ti

    me(

    mill

    i-se

    cond

    s)

    Response time series after saturation: 40K subscriptions, 20 servers, 150K event/sec

    050

    100150200250300350400

    0 200000 400000 600000 800000 1000000

    Time(milli-seconds)

    Res

    pons

    e Ti

    me(

    mill

    i-sec

    onds

  • IBM Research

    © 2011 IBM Corporation16

    Impact of event rate to average response time Compare BlueDove to Full-replication scheme

    – 20 servers

    – 40,000 subscriptions

    1

    10

    100

    1000

    10000

    100000

    1000000

    0 50 100 150Event Rate(1,000 events per second)

    Res

    pons

    e Ti

    me(

    milli

    -sec

    onds

    )

    full-repbluedove

  • IBM Research

    © 2011 IBM Corporation17

    Scalability to event rate Fix number of subscriptions (40K), increase number of servers

    0

    20

    40

    60

    80

    100

    120

    5 10 15 20Number of Servers

    Sat

    urat

    ion

    Eve

    nt R

    ate

    (eve

    nts

    per

    mill

    i-sec

    onds

    )

    bluedovefull-repp2p

  • IBM Research

    © 2011 IBM Corporation18

    Scalability to number of subscriptions Fix event rate; increase number of servers

    0

    10000

    20000

    30000

    40000

    50000

    60000

    5 10 15 20Number of Servers

    Num

    ber o

    f Sub

    scrip

    tions

  • IBM Research

    © 2011 IBM Corporation19

    CPU load distribution CPU load across different servers

    – 20 servers; 40,000 subscriptions; 100,000 events per second for BlueDove, 27,000 events per second for p2p scheme

    0

    2

    4

    6

    8

    10

    12

    14

    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

    Server ID

    CP

    U L

    oad

    bluedovep2p

  • IBM Research

    © 2011 IBM Corporation20

    Matching load distribution Number of events (109) matched on different servers

    – 20 servers; 40,000 subscriptions; 100,000 events per second for BlueDove, 27,000 events per second for p2p scheme

    0

    0.5

    1

    1.5

    2

    2.5

    3

    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

    Server ID

    Num

    ber o

    f Sub

    scrip

    tions bluedove

    p2p

  • IBM Research

    © 2011 IBM Corporation21

    Different Event Dispatching Policies Compare random, subscription-number-based, response-

    time-based and prediction-based policies

    0

    1000

    2000

    3000

    4000

    5000

    6000

    7000

    Prediction Rsp-Time Sub-Num

    Satu

    ratio

    n Ev

    ent R

    ate

  • IBM Research

    © 2011 IBM Corporation22

    Elasticity upon increasing load Event rate increases continuously. Initially there are 5 nodes

    6 7 8

    6,000 events/sec

    600 events/sec

    Time(MilliSecond)

  • IBM Research

    © 2011 IBM Corporation23

    Impact of the number of partition dimensions

    0

    20

    40

    60

    80

    100

    120

    1 2 3 4

    Number of Dimensions

    Satu

    ratio

    n Ev

    ent R

    ate(

    even

    ts

    per m

    illi-s

    econ

    d) Fix the number of subscriptions and change the number of

    partition dimensions– 20 servers; 40,000 subscriptions

  • IBM Research

    © 2011 IBM Corporation24

    Impact of the skewness of the distribution of subscriptions Change the standard deviation of subscription distribution

    – 20 servers; 40,000 subscriptions

    60

    70

    80

    90

    100

    110

    120

    200 400 600 800 1000

    Standard Deviation

    Sat

    urat

    ion

    Eve

    nt R

    ate(

    even

    ts p

    erm

    illi-s

    econ

    d)

  • IBM Research

    © 2011 IBM Corporation25

    Summary

    A cloud based publish/subscribe service provides an answer to the challenge of interconnecting numerous information producers and consumers in the Smarter Planet

    BlueDove exploits the skewness in data distribution and difference across multiple dimensions to achieve high performance

    Evaluation has shown its linear capacity scaling and agile adaptation to changes in workload

  • IBM Research

    © 2011 IBM Corporation26

    Thank you!

    Questions?

  • IBM Research

    © 2011 IBM Corporation27

    Impact of the skewness of the distribution of events Change the number of event dimensions that are skewed in the

    same way as their corresponding subscription dimensions– 20 servers; 40,000 subscriptions

    405060708090

    100110120

    0 1 2 3 4Number of Skewed Dimensions

    Satu

    ratio

    n Ev

    ent R

    ate(

    even

    tspe

    r milli

    -sec

    ond)

  • IBM Research

    © 2011 IBM Corporation28

    Future Work

    Explicit load balancing – Move subscriptions from overloaded nodes to less-loaded ones

    Integration with BlueDove queue– BlueDove queue takes the responsibility of pushing publications

    to subscribers

    Persistence– “outsource” data storage to a cloud-based storage (e.g.

    Cassandra)

  • IBM Research

    © 2011 IBM Corporation29

    Cross-data-center replication of subscriptions

    Implicit replication– Each subscription is already replicated to multiple servers

    – With high probability at least two servers are in different data centers

    Explicit replication– If all replicas are in the same data center, then explicitly

    make a copy in another data center