socialite, the open source status feed part 1: design overview and scaling for infinite content

Post on 08-Sep-2014

492 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

Building a Social Platform

Part 1: Design Overview;Storing Infinite Content

Solutions Engineering

• Identify Popular Use Cases– Directly from MongoDB Users– Addressing "limitations"

• Go beyond documentation and blogs• Create open source project• Run it!

Social Status Feed

Agenda• What is a status feed and why build it w/MongoDB• Application overview (goals, non-goals) • Architecture overview (arch diagram)• Operational overview (benchmarks, automation)• Describe components – Describe options

• For each component– Options tried– Results– Option chosen

Socialite

• News/Social Status Feed: popular and common

• Appears misleadingly simple: turns out to have many tricky problems to solve to have good performance

• We created a reference implementation – Configurable models and options– Built-in benchmarking

• Used this implementation to test out different options.• This talk will summarize

Status Feed

Status Feed

Socialite

• Open Source• Reference Implementation – Various Fanout Feed Models– User Graph Implementation– Content storage

• Configurable models and options• REST API in Dropwizard (Yammer)– https://dropwizard.github.io/dropwizard/

• Built-in benchmarking

https://github.com/10gen-labs/socialite

Architecture

Graph Service

Proxy

Cont

ent

Prox

y

Pluggable Services

• Major components each have an interface– see com.mongodb.socialite.services

• Configuration selects implementation to use• ServiceManager organizes : – Default implementations– Lifecycle– Binding configuration– Wiring dependencies– see com.mongodb.socialite.ServiceManager

Simple Interface

GET /users/{user_id} Get a User by their ID DELETE /users/{user_id} Remove a user by their ID POST /users/{user_id}/posts Send a message from this user GET /users/{user_id}/followers Get a list of followers of a user GET /users/{user_id}/followers_count Get the number of followers of a user GET /users/{user_id}/following Get the list of users this user is following GET /users/{user_id}/following count Get the number of users this user follows GET /users/{user_id}/posts Get the messages sent by a user GET /users/{user_id}/timeline Get the timeline for this user PUT /users/{user_id} Create a new user PUT /users/{user_id}/following/{target} Follow a user DELETE /users/{user_id}/following/{target} Unfollow a user

https://github.com/10gen-labs/socialite

Technical Decisions

User

timeline cache

Schema

Indexing Horizontal Scaling

Operational Setup

Real life validation of our choices.

User facing latencyLinear scaling of resources

Most important criteria?

Operational Testing

Scaling Goals

• Realistic real-life-scale workload– compared to Twitter, etc.

• Understanding of HW required– containing costs

• Confirm architecture scales linearly– without loss of responsiveness

Architecture

Graph Service

Proxy

Cont

ent

Prox

y

DB Architecture

The storage layer is separate from Socialite services, and each service has its own URI – its own mongodb server or cluster that can be configured differently from others.

This allows us to physically optimize each services' DB for the workload we'll be running on it.

It also allows us to scale out the DB that's currently the limiting factor (the bottleneck) in our setup.

Operational Testing

Operational Testing

Operational Testing

Operational Testing

Operational Testing

Operational Testing

Operational Testing

Operational Testing

Operational Testing

Operational Testing

Operational Testing

Operational Testing

Operational Testing

Operational Testing

Built-in benchmark capability

Operational Testing

• All hosts in AWS• Each service used its own DB, cluster or shards• All benchmarks through `mongos` (sharded config)• Used MMS monitoring for measuring throughput• Used internal benchmarks for measuring latency• Based volume tested on real life social metrics

Scaling for Infinite Content

Architecture

Graph Service

Proxy

Cont

ent

Prox

y

Socialite Content Service

• System of record for all user content• Initially very simple (no search)• Mainly designed to support feed– Lookup/indexed by _id and userid– Time based anchors/pagination

• Half life of most content is 1 day !

• Popular content usually < 1 month

• Access to old data is rare

Social Data Ages Fast

top related