xldb2011 tue 1005_linked_in

40
Data Infrastructure at LinkedIn Shirshanka Das XLDB 2011 1

Upload: liqiang-xu

Post on 11-May-2015

683 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Xldb2011 tue 1005_linked_in

Data Infrastructure at LinkedIn

Shirshanka Das

XLDB 2011

1

Page 2: Xldb2011 tue 1005_linked_in

Me

UCLA Ph.D. 2005 (Distributed protocols in content

delivery networks)

PayPal (Web frameworks and Session Stores)

Yahoo! (Serving Infrastructure, Graph Indexing, Real-time

Bidding in Display Ad Exchanges)

@ LinkedIn (Distributed Data Systems team): Distributed

data transport and storage technology (Kafka, Databus,

Espresso, ...)

2

Page 3: Xldb2011 tue 1005_linked_in

Outline

LinkedIn Products

Data Ecosystem

LinkedIn Data Infrastructure Solutions

Next Play

3

Page 4: Xldb2011 tue 1005_linked_in

LinkedIn By The Numbers

120,000,000+ users in August 2011

2 new user registrations per second

4 billion People Searches expected in 2011

2+ million companies with LinkedIn Company Pages

81+ million unique visitors monthly*

150K domains feature the LinkedIn Share Button

7.1 billion page views in Q2 2011

1M LinkedIn Groups

* Based on comScore, Q2 2011

4

Page 5: Xldb2011 tue 1005_linked_in

5

Member Profiles

Page 6: Xldb2011 tue 1005_linked_in

Signal - faceted stream search

6

Page 7: Xldb2011 tue 1005_linked_in

People You May Know

7

Page 8: Xldb2011 tue 1005_linked_in

Outline

LinkedIn Products

Data Ecosystem

LinkedIn Data Infrastructure Solutions

Next Play

8

Page 9: Xldb2011 tue 1005_linked_in

Three Paradigms : Simplifying the Data Continuum

• Member Profiles

• Company Profiles

• Connections

• Communications

Online

• Signal

• Profile Standardization

• News

• Recommendations

• Search

• Communications

Nearline

• People You May Know

• Connection Strength

• News

• Recommendations

• Next best idea

Offline

9

Activity that should

be reflected immediately

Activity that should

be reflected soon

Activity that can be

reflected later

Page 10: Xldb2011 tue 1005_linked_in

Data Infrastructure Toolbox (Online)

Capabilities

Key-value access

Rich structures (e.g.

indexes)

Change capture

capability

Search platform

Graph engine

10

Systems Analysis

{

Page 11: Xldb2011 tue 1005_linked_in

Data Infrastructure Toolbox (Nearline)

Capabilities

Change capture streams

Messaging for site

events, monitoring

Nearline processing

11

Systems Analysis

Page 12: Xldb2011 tue 1005_linked_in

Data Infrastructure Toolbox (Offline)

Capabilities

Machine learning,

ranking, relevance

Analytics on

Social gestures

12

Systems Analysis

Page 13: Xldb2011 tue 1005_linked_in

Laying out the tools

13

Page 14: Xldb2011 tue 1005_linked_in

Outline

LinkedIn Products

Data Ecosystem

LinkedIn Data Infrastructure Solutions

Next Play

14

Page 15: Xldb2011 tue 1005_linked_in

Focus on four systems in Online and Nearline

Data Transport

– Kafka

– Databus

Online Data Stores

– Voldemort

– Espresso

15

Page 16: Xldb2011 tue 1005_linked_in

Kafka: High-Volume Low-Latency Messaging System

LinkedIn Data Infrastructure Solutions

16

Page 17: Xldb2011 tue 1005_linked_in

Kafka: Architecture

17

WebTier

Topic 1

Broker Tier

Push

Event

s

Topic 2

Topic N

Zookeeper Offset

Management

Topic, Partition

Ownership

Sequential write sendfile

Kafk

a

Clie

nt Lib

Consumers

Pull

Events Iterator 1

Iterator n

Topic Offset

100 MB/sec 200 MB/sec

Billions of Events

TBs per day

Inter-colo: few seconds

Typical retention: weeks

Scale Guarantees

At least once delivery

Very high throughput

Low latency

Durability

Page 18: Xldb2011 tue 1005_linked_in

Databus : Timeline-Consistent Change Data Capture

LinkedIn Data Infrastructure Solutions

18

Page 19: Xldb2011 tue 1005_linked_in

Relay

Databus at LinkedIn

Event Win

19

DB

Bootstrap

Capture

Changes On-line

Changes

On-line

Changes

DB

Consistent

Snapshot at U

Consumer 1

Consumer n

Data

bus

Clie

nt Lib

Client

Consumer 1

Consumer n

Data

bus

Clie

nt Lib

Client

Features

Transport independent of data source: Oracle, MySQL, …

Portable change event serialization and versioning

Start consumption from arbitrary point

Guarantees

Transactional semantics

Timeline consistency with the data source

Durability (by data source)

At-least-once delivery

Availability

Low latency

Page 20: Xldb2011 tue 1005_linked_in

Voldemort: Highly-Available Distributed Data Store

LinkedIn Data Infrastructure Solutions

20

Page 21: Xldb2011 tue 1005_linked_in

Highlights

• Open source

• Pluggable components

• Tunable consistency /

availability

• Key/value model,

server side “views”

In production

• Data products

• Network updates, sharing,

page view tracking,

rate-limiting, more…

• Future: SSDs,

multi-tenancy

Voldemort: Architecture

Page 22: Xldb2011 tue 1005_linked_in

Espresso: Indexed Timeline-Consistent Distributed

Data Store

LinkedIn Data Infrastructure Solutions

22

Page 23: Xldb2011 tue 1005_linked_in

Espresso: Key Design Points

Hierarchical data model

– InMail, Forums, Groups, Companies

Native Change Data Capture Stream

– Timeline consistency

– Read after Write

Rich functionality within a hierarchy

– Local Secondary Indexes

– Transactions

– Full-text search

Modular and Pluggable

– Off-the-shelf: MySQL, Lucene, Avro

23

Page 24: Xldb2011 tue 1005_linked_in

Application View

24

Page 25: Xldb2011 tue 1005_linked_in

Partitioning

25

Page 26: Xldb2011 tue 1005_linked_in

Node 3

Node 2

Partition Layout: Master, Slave

Cluster

Manager

Partition: P.1

Node: 1

Partition: P.12

Node: 3

Database

Node: 1

M: P.1 – Active

S: P.5 – Active

Cluster Node 1

P.1 P.2

P.4

P.3

P.5 P.6

P.9 P.1

0

P.5 P.6

P.8

P.7

P.1 P.2

P.11 P.1

2

P.9 P.1

0

P.1

2

P.11

P.3 P.4

P.7 P.8 Master

Slave

3 Storage Engine nodes, 2 way replication

Page 27: Xldb2011 tue 1005_linked_in

Espresso: API

REST over HTTP

Get Messages for bob

– GET /MailboxDB/MessageMeta/bob

Get MsgId 3 for bob

– GET /MailboxDB/MessageMeta/bob/3

Get first page of Messages for bob that are unread and in the inbox

– GET /MailboxDB/MessageMeta/bob/?query=“+isUnread:true

+isInbox:true”&start=0&count=15

27

Page 28: Xldb2011 tue 1005_linked_in

Espresso: API Transactions

• Add a message to bob’s mailbox • transactionally update mailbox aggregates, insert into metadata and details.

POST /MailboxDB/*/bob HTTP/1.1

Content-Type: multipart/binary; boundary=1299799120

Accept: application/json

--1299799120

Content-Type: application/json

Content-Location: /MailboxDB/MessageStats/bob

Content-Length: 50

{“total”:”+1”, “unread”:”+1”}

--1299799120

Content-Type: application/json

Content-Location: /MailboxDB/MessageMeta/bob

Content-Length: 332

{“from”:”…”,”subject”:”…”,…}

--1299799120

Content-Type: application/json

Content-Location: /MailboxDB/MessageDetails/bob

Content-Length: 542

{“body”:”…”}

--1299799120—

28

Page 29: Xldb2011 tue 1005_linked_in

Espresso: System Components

29

Page 30: Xldb2011 tue 1005_linked_in

Espresso @ LinkedIn

First applications

– Company Profiles

– InMail

Next

– Unified Social Content Platform

– Member Profiles

– Many more…

30

Page 31: Xldb2011 tue 1005_linked_in

Espresso: Next steps

Launched first application Oct 2011

Open source 2012

Multi-Datacenter support

Log-structured storage

Time-partitioned data

31

Page 32: Xldb2011 tue 1005_linked_in

Outline

LinkedIn Products

Data Ecosystem

LinkedIn Data Infrastructure Solutions

Next Play

32

Page 33: Xldb2011 tue 1005_linked_in

The Specialization Paradox in Distributed Systems

Good: Build specialized

systems so you can do each

thing really well

Bad: Rebuild distributed

routing, failover, cluster

management, monitoring,

tooling

33

Page 34: Xldb2011 tue 1005_linked_in

Generic Cluster Manager: Helix

• Generic Distributed State Model

• Centralized Config Management

• Automatic Load Balancing

• Fault tolerance

• Health monitoring

• Cluster expansion and

rebalancing

• Open Source 2012

• Espresso, Databus and Search

34

Page 35: Xldb2011 tue 1005_linked_in

Stay tuned for

Innovation

– Nearline processing

– Espresso eco-system

– Storage / indexing

– Analytics engine

– Search

Convergence

– Building blocks for distributed data

management systems

35

Page 36: Xldb2011 tue 1005_linked_in

Thanks!

36

Page 37: Xldb2011 tue 1005_linked_in

Appendix

37

Page 38: Xldb2011 tue 1005_linked_in

Espresso: Routing

Router is a high-performance HTTP proxy

Examines URL, extracts partition key

Per-db routing strategy

– Hash Based

– Route To Any (for schema access)

– Range (future)

Routing function maps partition key to partition

Cluster Manager maintains mapping of partition to hosts:

– Single Master

– Multiple Slaves

38

Page 39: Xldb2011 tue 1005_linked_in

Espresso: Storage Node

Data Store (MySQL)

– Stores document as Avro serialized blob

– Blob indexed by (partition key {, sub-key})

– Row also contains limited metadata

Etag, Last modified time, Avro schema version

Document Schema specifies per-field index constraints

Lucene index per partition key / resource

39

Page 40: Xldb2011 tue 1005_linked_in

Espresso: Replication

MySQL replication of mastered partitions

MySQL “Slave” is MySQL instance with custom storage

engine

– custom storage engine just publishes to databus

Per-database commit sequence number

Replication is Databus

– Supports existing downstream consumers

Storage node consumes from Databus to update

secondary indexes and slave partitions

40