cs347notes131 cs 347: parallel and distributed data management notes xx: publish/subscribe hector...

48
CS347 Notes13 1 CS 347: Parallel and Distributed Data Management Notes XX: Publish/Subscribe Hector Garcia-Molina

Upload: jacob-chapman

Post on 28-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS347Notes131 CS 347: Parallel and Distributed Data Management Notes XX: Publish/Subscribe Hector Garcia-Molina

CS347 Notes13 1

CS 347: Parallel and Distributed

Data ManagementNotes XX: Publish/Subscribe

Hector Garcia-Molina

Page 2: CS347Notes131 CS 347: Parallel and Distributed Data Management Notes XX: Publish/Subscribe Hector Garcia-Molina

Point to Point Communication

CS347 Notes13 2

To: JoeFrom: SallyBody: B

Page 3: CS347Notes131 CS 347: Parallel and Distributed Data Management Notes XX: Publish/Subscribe Hector Garcia-Molina

Publish/Subscribe Communication

CS347 Notes13 3

PublicationDescription: DBody: B

SubscriptionQuery: QId: I

Page 4: CS347Notes131 CS 347: Parallel and Distributed Data Management Notes XX: Publish/Subscribe Hector Garcia-Molina

Publish/Subscribe Communication

CS347 Notes13 4

PublicationDescription: DBody: B

Pu

blis

h/S

ub

scri

be S

yst

em

SubscriptionQuery: QId: I

Page 5: CS347Notes131 CS 347: Parallel and Distributed Data Management Notes XX: Publish/Subscribe Hector Garcia-Molina

P/S Semantics

• Subscribe(query Q; id I):add [Q, I] to SDB

• Publish(description D; body B):for [Q,I] in SDB do if Q[D] then notify(I, B)

CS347 Notes13 5

PublicationDescription: DBody: B

SubscriptionQuery: QId: I

SDB

Page 6: CS347Notes131 CS 347: Parallel and Distributed Data Management Notes XX: Publish/Subscribe Hector Garcia-Molina

P/S Semantics

• Subscribe(query Q; id I):add [Q, I] to SDB

• Publish(description D; body B):for [Q,I] in SDB do if Q[D] then notify(I, B)

CS347 Notes13 6

PublicationDescription: DBody: B

SubscriptionQuery: QId: I

SDB

Notify can send email,or can put B in DB(I)(which I checks periodically)

Page 7: CS347Notes131 CS 347: Parallel and Distributed Data Management Notes XX: Publish/Subscribe Hector Garcia-Molina

P/Q Semantics (Alternative)

• Publish(description D; body B):add [D, B] to PDB

• Query(query Q; id I):for [D, B] in PDB do if Q[D] then notify(I, B)

CS347 Notes13 7

PublicationDescription: DBody: B

SubscriptionQuery: QId: I

PDB

Page 8: CS347Notes131 CS 347: Parallel and Distributed Data Management Notes XX: Publish/Subscribe Hector Garcia-Molina

P/Q Semantics (Alternative)

• Publish(description D; body B):add [D, B] to PDB

• Query(query Q; id I):for [D, B] in PDB do if Q[D] then notify(I, B)

CS347 Notes13 8

PublicationDescription: DBody: B

SubscriptionQuery: QId: I

PDB

Notify could just send outnew material (not seen byprevious queries

Page 9: CS347Notes131 CS 347: Parallel and Distributed Data Management Notes XX: Publish/Subscribe Hector Garcia-Molina

P/S Features

• Space decoupling: interacting parties do not need to know each other

• Time decoupling: interacting parties no not need to actively participate at the same time

• Synchronization decoupling: publishers and subscribers do not block for each other

CS347 Notes13 9

Page 10: CS347Notes131 CS 347: Parallel and Distributed Data Management Notes XX: Publish/Subscribe Hector Garcia-Molina

Other Communication Models

• Message passing• RPC (remote procedure call)• Shared Spaces (bulletin board)• Message Queues

CS347 Notes13 10

Page 11: CS347Notes131 CS 347: Parallel and Distributed Data Management Notes XX: Publish/Subscribe Hector Garcia-Molina

Description/Query Models

• (flat) Topics– e.g.: Topics = {sports, business, politics, ...}

CS347 Notes13 11

Pub P1Description: {s}Body: B1

Sub S1Query: {b}Id: I1

SDB

Pub P2Description: {s, b}Body: B2

Sub S2Query: {s, p}Id: I2

Page 12: CS347Notes131 CS 347: Parallel and Distributed Data Management Notes XX: Publish/Subscribe Hector Garcia-Molina

Description/Query Models

• (flat) Topics– T is set of possible topics– description D is a subset of T– query Q is a subset of T– D matches Q if D Q (empty set)

CS347 Notes13 12

Page 13: CS347Notes131 CS 347: Parallel and Distributed Data Management Notes XX: Publish/Subscribe Hector Garcia-Molina

Description/Query Models

• Topic Hierarchy

CS347 Notes13 13

all

sports business politics

soccer football rugby

college NFL

tech service

Page 14: CS347Notes131 CS 347: Parallel and Distributed Data Management Notes XX: Publish/Subscribe Hector Garcia-Molina

Description/Query Models

• Topic Hierarchy

CS347 Notes13 14

all

sports business politics

soccer football rugby

college NFL

tech service

P1.D= all.sports.footballP2.D= all.sportsP3.D = {all.business.tech, all.politics}

S1.Q=all.sports.rugbyS2.Q={all.politics.calif, all.sports.soccer}

Page 15: CS347Notes131 CS 347: Parallel and Distributed Data Management Notes XX: Publish/Subscribe Hector Garcia-Molina

Description/Query Models

• (flat) Topics– T is tree of topics (or DAG?)– description D is a set of paths in T– query Q is a set of paths in T– description path d matches query path

qif q is a prefix of d

– D matches Q if there exists a path in Qthat matches a path in D

CS347 Notes13 15

Page 16: CS347Notes131 CS 347: Parallel and Distributed Data Management Notes XX: Publish/Subscribe Hector Garcia-Molina

Description/Query Models

• Key-value pairs

CS347 Notes13 16

P1.D={[price, 50], [size, L]}P2.D={[price, 80]}P3.D={[size, M], [size, L]}

S1.Q={[price, 50]}S2.Q={[price, 50], [size, M]}

Page 17: CS347Notes131 CS 347: Parallel and Distributed Data Management Notes XX: Publish/Subscribe Hector Garcia-Molina

Description/Query Models

• Key-value pairs

CS347 Notes13 17

P1.D={[price, 50], [size, L]}P2.D={[price, 80]}P3.D={[size, M], [size, L]}

S1.Q={[price, 50]}S2.Q={[price, 50], [size, M]}

s3.Q= [price > 40] AND [size = L]

Page 18: CS347Notes131 CS 347: Parallel and Distributed Data Management Notes XX: Publish/Subscribe Hector Garcia-Molina

Matching Descriptions to Queries

CS347 Notes13 18

PublicationDescription: DBody: B

SubscriptionQuery: QId: I

SDB

Page 19: CS347Notes131 CS 347: Parallel and Distributed Data Management Notes XX: Publish/Subscribe Hector Garcia-Molina

Matching Descriptions to Queries

CS347 Notes13 19

PublicationDescription: DBody: B

SubscriptionQuery: QId: I

Page 20: CS347Notes131 CS 347: Parallel and Distributed Data Management Notes XX: Publish/Subscribe Hector Garcia-Molina

Generic Distributed Matching

CS347 Notes13 20

Pi

Sj

(to any row)

(to any column)

Pi/Sj match

Page 21: CS347Notes131 CS 347: Parallel and Distributed Data Management Notes XX: Publish/Subscribe Hector Garcia-Molina

Generic Distributed Matching

CS347 Notes13 21

a b

d

f

c

e

Pi

Sj

(to any row)

(to any column)

Pi/Sj match

• pub to one of {a,b}, {c,d}, {e,f}• sub to one of {a,c,e}, {b,d,f}

sound familiar??

Page 22: CS347Notes131 CS 347: Parallel and Distributed Data Management Notes XX: Publish/Subscribe Hector Garcia-Molina

Can use any abort/commit quorums

CS347 Notes13 22

a b

dcPi

Sj

pub to one of {{a,b}, {c,d}} sub to one of {{a,c}, {a,d}, {b,c},

{b,d} }

Page 23: CS347Notes131 CS 347: Parallel and Distributed Data Management Notes XX: Publish/Subscribe Hector Garcia-Molina

Generic Distributed Matching

CS347 Notes13 23

a b

d

f

c

e

Pi

Sj

(to any row)

(to any column)

• pub to one of {a,b}, {c,d}, {e,f}• sub to one of {a,c,e}, {b,d,f}

Issues:•Replicated data (subs)•Load balance•Total work

Page 24: CS347Notes131 CS 347: Parallel and Distributed Data Management Notes XX: Publish/Subscribe Hector Garcia-Molina

Simple Cost Model

CS347 Notes13 24

a b

d

f

c

e

Pi

(to any row)

(to any column)Sj

• At node with x subs, y pubs:work(x,y), data(x)

• Ex: work(x,y)=xy (case (i)) work(x,y)=y (case (ii)) data(x)=x

• For 6 node grid: s subs, p pubs• Each node: s/2 subs, p/3 pubs• Scenario I: total data = 6*data(s/2) = 3s

total work = 6*work(s/2, p/3) = 6(p/3)= 2p (case (ii))• Scenario II: total data = 3s (same as before)

total work = 6*work(s/2, p/3) = 6(s/2)(p/3) = sp (case (i))

• Compare to single node scenario: total data = s total work = p (case (ii)) or sp (case (i))

Page 25: CS347Notes131 CS 347: Parallel and Distributed Data Management Notes XX: Publish/Subscribe Hector Garcia-Molina

Topic Distributed Matching

CS347 Notes13 25

t1

t2

t3

Pi

Sj

(to one topic)

(to one topic)Say T= { t1, t2, t3 }

Page 26: CS347Notes131 CS 347: Parallel and Distributed Data Management Notes XX: Publish/Subscribe Hector Garcia-Molina

Topic Distributed Matching

CS347 Notes13 26

t1

t2

t3

Pi

Sj

(to one topic)

(to one topic)Say T= { t1, t2, t3 }

Sound familiar?•Data fragmentation (in this case subs)•Query localization (in this case pubs)

Page 27: CS347Notes131 CS 347: Parallel and Distributed Data Management Notes XX: Publish/Subscribe Hector Garcia-Molina

Matching with Topic Hierarchy

CS347 Notes13 27

t

t/1 t/2 t/3

t/1/1 t/1/2 t/3/1 t/3/2 t/3/3

t/1,2

t/3

t/3/1

t/3/2,3

all pubs

t/3/1 pubsseen here

Publication dissemination tree:

Page 28: CS347Notes131 CS 347: Parallel and Distributed Data Management Notes XX: Publish/Subscribe Hector Garcia-Molina

Matching with Topic Hierarchy

CS347 Notes13 28

t

t/1 t/2 t/3

t/1/1 t/1/2 t/3/1 t/3/2 t/3/3

t/1,2

t/3

t/3/1

t/3/2,3

all pubs

t/3/1 pubsseen here

Publication dissemination tree:

subs for t/3or for (t/3/1 or t/3/2)

subs for t/3/2or for (t/3/2 or t/3/3)

Page 29: CS347Notes131 CS 347: Parallel and Distributed Data Management Notes XX: Publish/Subscribe Hector Garcia-Molina

NetNews worked like this

CS347 Notes13 29

t

t/1 t/2 t/3

t/1/1 t/1/2 t/3/1 t/3/2 t/3/3

t/1,2

t/1,3

t/3/1

t/3/2,3

Publication dissemination tree:t/1,2

note replication

Page 30: CS347Notes131 CS 347: Parallel and Distributed Data Management Notes XX: Publish/Subscribe Hector Garcia-Molina

Discussion: Twitter

CS347 Notes13 30

a

b

c

de

follows

S(e) = {a,d}pubs by e have description “e”body of pub is 140 char maxusers periodically check for notifications

Page 31: CS347Notes131 CS 347: Parallel and Distributed Data Management Notes XX: Publish/Subscribe Hector Garcia-Molina

Discussion: Twitter

CS347 Notes13 31

a

bc

de

follows

follows inv lists:S(a): bS(b): a,c,d,eS(c): dS(d): -S(e): a, d

is-followed inv lists:S-1(a): b,eS-1(b): aS-1(c): bS-1(d): b,c,eS-1(e): b

Page 32: CS347Notes131 CS 347: Parallel and Distributed Data Management Notes XX: Publish/Subscribe Hector Garcia-Molina

Strawman Architecture- Centralized

CS347 Notes13 32

users

frontend

backend

log (all pubs)

pubs by user

a

b

c

d

. . .

notify by user

a

b

c

d

. . .

follows inv lists

is-followed inv lists

Page 33: CS347Notes131 CS 347: Parallel and Distributed Data Management Notes XX: Publish/Subscribe Hector Garcia-Molina

Strawman Architecture- Centralized

CS347 Notes13 33

users

frontend

backend

log (all pubs)

pubs by user

a

b

c

d

. . .

notify by user

a

b

c

d

. . .

follows inv lists

is-followed inv lists

Question: Advantages/disadvantagesof "notify by user" storage?Of "pubs by user"?

Page 34: CS347Notes131 CS 347: Parallel and Distributed Data Management Notes XX: Publish/Subscribe Hector Garcia-Molina

Strawman Architecture- Distributed

CS347 Notes13 34

users

frontend

backend

log (all pubs)

pubs by user

a

b

c

d

. . .

notify by user

a

b

c

d

. . .

follows inv lists

is-followed inv lists

frontend

frontend

How do we split back end???

Page 35: CS347Notes131 CS 347: Parallel and Distributed Data Management Notes XX: Publish/Subscribe Hector Garcia-Molina

Strawman Architecture- Distributed

CS347 Notes13 35

frontend

backend 2

log (1/2 pubs)

pubs by user

b

d. . .

notify by user

b

d. . .

follows inv lists

is-followed inv lists

frontend

frontend

backend 1

pubs by user

a

c. . .

notify by user

a

c. . .

log (1/2 pubs)

follows inv lists

is-followed inv lists

Page 36: CS347Notes131 CS 347: Parallel and Distributed Data Management Notes XX: Publish/Subscribe Hector Garcia-Molina

Dynamic Dissemination Tree

CS347 Notes13 36

t

t/1 t/2 t/3

t/1/1 t/1/2 t/3/1 t/3/2 t/3/3

t/1,2

t/3

all pubs

Publication dissemination tree:

new nodeinterest: t/1

wireless range

Page 37: CS347Notes131 CS 347: Parallel and Distributed Data Management Notes XX: Publish/Subscribe Hector Garcia-Molina

Dynamic Dissemination Tree

CS347 Notes13 37

t

t/1 t/2 t/3

t/1/1 t/1/2 t/3/1 t/3/2 t/3/3

t/1,2

t/3

t/1all pubs

Publication dissemination tree:

new nodeinterest: t/1

wireless range

Page 38: CS347Notes131 CS 347: Parallel and Distributed Data Management Notes XX: Publish/Subscribe Hector Garcia-Molina

Dynamic Dissemination Tree

CS347 Notes13 38

t

t/1 t/2 t/3

t/1/1 t/1/2 t/3/1 t/3/2 t/3/3

t/1,2

t/3

t/1all pubs

Publication dissemination tree:

wireless range

new nodeinterest: t/1/1 t/3/2

Page 39: CS347Notes131 CS 347: Parallel and Distributed Data Management Notes XX: Publish/Subscribe Hector Garcia-Molina

Dynamic Dissemination Tree

CS347 Notes13 39

t

t/1 t/2 t/3

t/1/1 t/1/2 t/3/1 t/3/2 t/3/3

t/1,2

t/3

t/1all pubs

Publication dissemination tree:

wireless range

new nodeinterest: t/1/1 t/3/2

t/1/1

t/3/2

Page 40: CS347Notes131 CS 347: Parallel and Distributed Data Management Notes XX: Publish/Subscribe Hector Garcia-Molina

Dynamic Dissemination Tree

CS347 Notes13 40

t

t/1 t/2 t/3

t/1/1 t/1/2 t/3/1 t/3/2 t/3/3

t/1,2

t/3

t/1all pubs

Publication dissemination tree:

new nodeinterest: t/1/1 t/3/2

t/1/1

t/3/2

new pub 1: t/1/2new pub 2: t/3/2

dissemination nodes can also publish:

Page 41: CS347Notes131 CS 347: Parallel and Distributed Data Management Notes XX: Publish/Subscribe Hector Garcia-Molina

Matching at One Node

• Set { [Qj, Ij] } of stored subscriptions• Match one publication p from stream

CS347 Notes13 41

{ [Qj, Ij] }p[D,B]

• Index of queries Qj different fromindex of descriptions (content)

Page 42: CS347Notes131 CS 347: Parallel and Distributed Data Management Notes XX: Publish/Subscribe Hector Garcia-Molina

Example

CS347 Notes13 42

s1 (a,b)s2 (a,d)s3 (a,d,e)s4 (b,f)s5 (c,d,e,f)

sample publicationa c a f b c

a s1 s2 s3

b s1 s4

c s5

d s2 s3 s5

e s3 s5

f s4 s5

Subscriptions:

Inverted Lists:

• Match semantics for example: s1=(a,b) matches pub if both a and b appear in pub

• Can generalize, e.g., sj = a (b c) handle as two subs, sj1= (a b), sj2 = (a c)

Page 43: CS347Notes131 CS 347: Parallel and Distributed Data Management Notes XX: Publish/Subscribe Hector Garcia-Molina

Example

CS347 Notes13 43

s1 (a,b)s2 (a,d)s3 (a,d,e)s4 (b,f)s5 (c,d,e,f)

sample publicationa c a f b c

a s1 s2 s3

b s1 s4

c s5

d s2 s3 s5

e s3 s5

f s4 s5

Subscriptions:

Inverted Lists:

• Intersection of lists (=null) not useful• Union of lists {s1,s2,s3,s4,s5} gives candidate subscriptions• Need to check each candidate (e.g., s1 matches, s2 does not)

Page 44: CS347Notes131 CS 347: Parallel and Distributed Data Management Notes XX: Publish/Subscribe Hector Garcia-Molina

Counting Method

CS347 Notes13 44

total counts1 2 0s2 2 0s3 3 0s4 2 0s5 4 0

distinct word seta b c f

s1 (a,b)s2 (a,d)s3 (a,d,e)s4 (b,f)s5 (c,d,e,f)

sample publicationa c a f b c

a s1 s2 s3

b s1 s4

c s5

d s2 s3 s5

e s3 s5

f s4 s5

Subscriptions:

Inverted Lists:

Page 45: CS347Notes131 CS 347: Parallel and Distributed Data Management Notes XX: Publish/Subscribe Hector Garcia-Molina

Counting Method

CS347 Notes13 45

total counts1 2 0s2 2 0s3 3 0s4 2 0s5 4 0

distinct word seta b c f

s1 (a,b)s2 (a,d)s3 (a,d,e)s4 (b,f)s5 (c,d,e,f)

sample publicationa c a f b c

a s1 s2 s3

b s1 s4

c s5

d s2 s3 s5

e s3 s5

f s4 s5

Subscriptions:

Inverted Lists:

• As we union lists, count number of timeseach sub appears

• If count total, then sub matches

Page 46: CS347Notes131 CS 347: Parallel and Distributed Data Management Notes XX: Publish/Subscribe Hector Garcia-Molina

Key Method

CS347 Notes13 46

distinct word seta b c f

s1 (a,b)s2 (a,d)s3 (a,d,e)s4 (b,f)s5 (c,d,e,f)

sample publicationa c a f b c

Subscriptions:

Inverted Lists:

a s1 [1, (b)] s2 [1, (d)] s3 [2, (d,e)]

b s4 [1, (f)]

c s5 [3, (d,e,f)]

d

e

f null

null

null

occurrencetable

abcf

Page 47: CS347Notes131 CS 347: Parallel and Distributed Data Management Notes XX: Publish/Subscribe Hector Garcia-Molina

Key Method

CS347 Notes13 47

distinct word seta b c f

s1 (a,b)s2 (a,d)s3 (a,d,e)s4 (b,f)s5 (c,d,e,f)

sample publicationa c a f b c

Subscriptions:

Inverted Lists:

a s1 [1, (b)] s2 [1, (d)] s3 [2, (d,e)]

b s4 [1, (f)]

c s5 [3, (d,e,f)]

d

e

f null

null

null

• sub in only one inverted list• each IL entry contains other

terms in sub• occurrence table is for

fast hash lookup

occurrencetable

abcf

Page 48: CS347Notes131 CS 347: Parallel and Distributed Data Management Notes XX: Publish/Subscribe Hector Garcia-Molina

Summary: Publish/Subscribe

• P/S semantics• Various query/description models• Distributed matching• Matching at one node

CS347 Notes13 48