the hidden pub/sub of spotify

29
The Hidden Pub/Sub of Spotify Vinay Setty 1 , Gunnar Kreitz 2,3 , Roman Vitenberg 1 , Maarten van Steen 4 , Guido Urdaneta 2 , Staffan Gimåker 2 1 2 3 4

Upload: doque

Post on 01-Jan-2017

225 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Hidden Pub/Sub of Spotify

The Hidden Pub/Sub of Spotify

Vinay Setty1, Gunnar Kreitz2,3, Roman Vitenberg1, Maarten van Steen 4, Guido Urdaneta2, Staffan Gimåker2

1 2 3 4

Page 2: The Hidden Pub/Sub of Spotify

The Hidden Pub/Sub of Spotify

Vinay Setty1, Gunnar Kreitz2,3, Roman Vitenberg1, Maarten van Steen 4, Guido Urdaneta2, Staffan Gimåker2

1 2 3 4

Page 3: The Hidden Pub/Sub of Spotify

What is Spotify?

!

• Over 20 million users

• Fast streaming

• Legal

• Social Interaction

�3

• On-demand peer-assisted music streaming

• Large catalogue, over 20 million tracks

• Available in US and 55 other countries worldwide

This image is from the Wikimedia Commons (CC BY-SA 3.0)

Page 4: The Hidden Pub/Sub of Spotify

�4

Online feed from friends and artists

Page 5: The Hidden Pub/Sub of Spotify

�5

Offline feed from friends and artistsOffline feed from friends and artists

Page 6: The Hidden Pub/Sub of Spotify

listened to track, playlist activity

Spotify Social Interaction

�6

Music Playlist

Artists Facebook Friend

Spotify Friend

Spotify user follow

listened/starred

track

followAlbum released

follow

Playlist u

pdated

Playlist cre

ated/

updated

followFriend joined Spotifylistened to track

Social

TopicTopic

TopicTopicSubscrib

erPub/Sub for Social Interaction!

Page 7: The Hidden Pub/Sub of Spotify

Spotify Pub/Sub

• In 2013 more than 20% of active Spotify users were using pub/sub

• More than 1 TB of pub/sub data is sent/received every day

• More than a dozen engineers working full time maintain and improve

• Design decisions change over time

�7

Page 8: The Hidden Pub/Sub of Spotify

Contributions• A case-study of Pub/Sub for social

interaction

• Spotify Pub/Sub architecture overview

• Analysis of real-world Pub/Sub workload

• Collected traces from production system

• Subscription workload distributions

• Publication event rate distribution

• Pub/Sub traffic analysis

�8

Page 9: The Hidden Pub/Sub of Spotify

Design Challenges for Spotify Pub/Sub

• Billions of notifications every day

• Millions of users to be served at any time

• Distributed across 3 data centers (sites)

• Different notification types

• Online feed

• Persisted and Offline feed

• Synchronization across devices

�9

Page 10: The Hidden Pub/Sub of Spotify

Overview of Pub/Sub Architecture

Access Points

Subscription dataPublication events

Client Client...

Publishers

Subscribers

Notification Module

Internet

Spotify Backend

�10

Database

Pub/Sub Engine

Page 11: The Hidden Pub/Sub of Spotify

Notification Types

Publisher Service Notification Type

Pub/Sub Module

Presence Service friend-feed Pub/Sub Engine

Playlist Servicefriend-feed, In-client, push and

Email

Pub/Sub Engine and Notification module

Artist ServiceIn-client, push

and EmailNotification module

Social ServiceIn-client, push

and EmailNotification module

���11

Page 12: The Hidden Pub/Sub of Spotify

Detailed Pub/Sub Architecture

Access Points

Notification Service

Rule Engine

Cassandra Cluster

Subscription dataPublication events Client Client...

Publishers

Subscribers

events

notifications

timestamps

Notification Module

Pub/Sub Engine

Internet

Spotify Backend

�12

Database

Page 13: The Hidden Pub/Sub of Spotify

Notification Module

• Determines notification type using Rule Engine

• Notification types

• In-client (for desktop clients)

• Push notifications (for mobile clients)

• Batch e-mail for offline users

• Notification persistence

�13

Page 14: The Hidden Pub/Sub of Spotify

Access Points

Notification Service

Rule Engine

Cassandra Cluster

Subscription dataPublication events Client

Artist Monitoring Service

Publishers

Subscribers

events

notifications

timestamps

Notification Module

Pub/Sub Engine

Internet

Spotify Backend

Pull Request

�14

Database

Client goes offline

Follow Paul McCartney

Paul McCartney released an album

Paul McCartney released an album

Client comes back online

Client is offline

Paul McCartney released an album

Paul McCartney released an album

Paul McCartney released an albumFollow Paul McCartney

Paul McCartney released an album

Offline Event Retrieval

Pull Request

Pull Request

Page 15: The Hidden Pub/Sub of Spotify

Pub/Sub Engine

• Aggregators aggregate subscriptions and distribute publications

• Ring of Pub/Sub brokers • Manage subscriptions (subscription and unsubscription)

• Match Publications

• Forwarding matched publications to aggregators

• Cross-site forwarding

• Load balancing

�15

Page 16: The Hidden Pub/Sub of Spotify

Pub/Sub Engine

AP1

Aggregator

A B

AP2

Client Client Client Client......

Aggregator

Subscription data Publication events

�16

...

...

Aggregators: 1. One-to-one per AP 2. Aggregate subscriptions 3. Distribute publications

Follow Paul M

cCartney

Follo

w P

aul

McC

artn

ey

Follow Paul

McC

artney

Spotify Backend

Broker-overlay Links

Pub/Sub Engine

Publishers

Follow Paul McCartney

Pub/Sub brokers: 1. Manage subscriptions 2. Match and forward publications 3. Organized as DHT to partition subscriptions

Page 17: The Hidden Pub/Sub of Spotify

Connecting Users Across Sites in Real-Time

�17

London (UK)

Ashburn (USA)Stockholm

(Sweden)

Broker-DHT LinksSubscription data Publication events

Follow Paul McCartney Follow Paul McCartney

Follow Paul M

cCartney

Paul M

cCar

tney

create

d a pl

aylist

Paul

McC

artney

created a playlist

Paul McCartney

created a playlist

Paul M

cCart

ney

listen

ed to

“Hey

Jude”

Paul McCartney

listened to “Hey Jude”

Paul McCartney

listened to “Hey Jude”

A B

C

A

C

B

A

C

B

User connected to Stockholm site

Paul McCartney connected to London

Site

Cross-Site Replication: 1. Subscriptions replicated in each site 2. One-to-one corresponding brokers in each site 3. Matching publications forwarded across sites

Paul McCartney connected to Ashburn Site

Page 18: The Hidden Pub/Sub of Spotify

Workload Analysis

• Traces from production system

• Mostly collected at Stockholm site

• From Thursday, 10 Jan 2013 to Saturday, 19 Jan 2013

• Study of subscription and publication workload

• Pub/Sub traffic trends and analysis

�18

Page 19: The Hidden Pub/Sub of Spotify

1e-05

0.0001

0.001

0.01

0.1

1

10

100

1e-05 0.0001 0.001 0.01 0.1 1 10

% T

opic

s or

% S

ubsc

ribers

Topic Popularity or Subscription Size

CCDF for Topic Popularity or Subscription Size

Topic Popularity (CCDF)Subscription Size (CCDF)

Topic & Subscription Distributions

• Topic-Popularity: % of total #subscribers subscribing to a topic

• Subscription Size: % of topics subscribed by a subscriber

• Power-law like distribution (visually from log-log scale plot)

• Similar to degree distribution in a Twitter social graph

• Most topics have very few subscribers

• Most subscribers are interested in very few topics

99% topics have < 0.001% of total #subscribers

99% users subscribe to < 0.001% of total #topics

CCDF = Complimentary Cumulative Distribution Function

���19

Page 20: The Hidden Pub/Sub of Spotify

1e-05

0.0001

0.001

0.01

0.1

1

10

100

1e-07 1e-06 1e-05 0.0001 0.001 0.01

% T

opic

s% of total publication event rate

CCDF for Publication Event Rate

Publication Event Rate Distribution

�20

• Publications generated per topic per day

• Not a Power-law like distribution

• Most topics generate very low event rate

99% topics generate less than 0.001% of the total #publications

Page 21: The Hidden Pub/Sub of Spotify

0.0001

0.001

0.01

0.1

1

10

100

1e-07 1e-06 1e-05 0.0001 0.001 0.01 0.1 1

% S

ubsc

ribers

Notification Rate (NR) per Subscriber

CCDF for Notification Rate per Subscriber

Notification Rate (NR) per Subscriber

�21

• Defined as percentage of daily publications attracted by a subscriber

• Not Power-law like distribution

• Varies across subscribers from 1% to as low as 10−7%

• Most subscribers have very low notification rate

90% users attract less than 0.001% of the total #publications

Page 22: The Hidden Pub/Sub of Spotify

Correlation b/w Topic Popularity & Event Rate

�22

NR per subscriber is linearly proportional to Subscription Size

Page 23: The Hidden Pub/Sub of Spotify

Correlation b/w Subscription Size & NR

�23

NR per subscriber is linearly proportional to Subscription Size

Page 24: The Hidden Pub/Sub of Spotify

Publication Traffic

�24

• Daily periodic pattern of publication traffic

• Maximum around 6 pm - 7 pm

• Minimum around 2 am

• Complements the design

• Online traffic is the most dominating

• Offline traffic is the least dominating 0.01

0.1

1

10

100

Thu - 0:00

Fri - 0:00

Sat - 0:00

Sun - 0:00

Mon - 0:00

Tue - 0:00

Wed - 0:00

Thu - 0:00

Fri - 0:00

Sat - 0:00%

of

da

ily p

ub

lica

tion

tra

ffic

UTC Time(Day - Hour)

Publication traffic per-publisher

Total trafficMusic Playback Traffic (Online) Playlist Update Traffic (Online)

Notifications Module Traffic (Persisted/Offline)

Page 25: The Hidden Pub/Sub of Spotify

Cross-Site Traffic

�25

• Publication traffic within the sites is dominating

• Cross-site traffic is order of magnitude lower

• Confirms the scalability of cross-site forwarding design

0

1

2

3

4

5

6

7

8

9

10

Thu - 0:00

Fri - 0:00

Sat - 0:00

Sun - 0:00

Mon - 0:00

Tue - 0:00

Wed - 0:00

Thu - 0:00

Fri - 0:00

Sat - 0:00%

of

da

ily p

ub

lica

tion

tra

ffic

UTC Time(Day - Hour)

Publication traffic within the sites vs across the sites

Traffic within a siteTraffic across sites

Page 26: The Hidden Pub/Sub of Spotify

Online Subscription Traffic

�26

• Client login/logout result in subscriptions and unsubscriptions

• Exhibits a daily periodic pattern

• Unsubscription traffic follows the same pattern as subscription traffic

• Short-lived subscriptions: Approximately 2 hour mean subscription validity

0.0006

0.0008

0.001

0.0012

0.0014

0.0016

0.0018

0.002

0.0022

Thu - 0:00

Fri - 0:00

Sat - 0:00

Sun - 0:00

Mon - 0:00

Tue - 0:00

Wed - 0:00

Thu - 0:00

Fri - 0:00

Sat - 0:00%

of

da

ily s

ub

scrip

tion

s

UTC Time(Day - Hour)

Subscription and unsubscription traffic

Subscription rateUnsubscription rate

Page 27: The Hidden Pub/Sub of Spotify

Total #Subscriptions

�27

• Exhibits Daily pattern

• Subscription count at any point is dominated by the Presence service

• Playlists and Notifications have significantly low #subscriptions

0.0001

0.001

0.01

0.1

1

Thu - 0:00

Fri - 0:00

Sat - 0:00

Sun - 0:00

Mon - 0:00

Tue - 0:00

Wed - 0:00

Thu - 0:00

Fri - 0:00

Sat - 0:00

% o

f daily

subsc

riptio

ns

(log s

cale

)

UTC Time(Day - Hour)

Pattern of percentage of total #subscriptions

Total subscriptionsPresence subscriptions

Playlist subscriptionsNotifications subscription

UTC-Time (Day-Hour)

Page 28: The Hidden Pub/Sub of Spotify

Conclusions• Pub/Sub used at Spotify for social

interaction among Spotify users

• Hybrid architecture to support online and offline notifications

• Workload similar to that of a Twitter social graph

• Daily periodic patterns in pub/sub traffic

• Design complements the workload & traffic

�28

Page 29: The Hidden Pub/Sub of Spotify

Thank you ! Questions?

[email protected]