cassandra at ebay - cassandra summit 2013

30
Cassandra @ e B a y Jay Patel Architect, Platform Systems @pateljay3001

Upload: jay-patel

Post on 21-May-2015

3.656 views

Category:

Technology


3 download

DESCRIPTION

"Buy It Now! Cassandra at eBay" talk at Cassandra Summit 2013 This session will cover various use cases for Cassandra at eBay. It’ll start with overview of eBay’s heterogeneous data platform comprised of SQL & NoSQL databases, and where Cassandra fits into that. For each use case, Jay will go into detail of system design, data model & multi-datacenter deployment. To conclude, Jay will summarize the best practices that guide Cassandra utilization at eBay. http://www.datastax.com/company/news-and-events/events/cassandrasummit2013

TRANSCRIPT

Page 1: Cassandra at eBay - Cassandra Summit 2013

Cassandra @ eBay Jay Patel Architect, Platform Systems @pateljay3001

Page 2: Cassandra at eBay - Cassandra Summit 2013

eBay Marketplaces

Thousands of servers Petabytes of data

Billions of SQLs/day 24x7x365 99.98+% Availability

turning over a TB every second Multiple Datacenters

Near-Real-time Always online

400+ million items for sale

$75 billion+ per year in goods are sold on eBay

Big Data

112 million active users

Billions of page views/day

Page 3: Cassandra at eBay - Cassandra Summit 2013

3

eBay Site Data Infrastructure

Don’t force! One size does not fit all.

It’s a mixture of multiple SQL & NoSQL databases. We use the right database for the right problem.

Page 4: Cassandra at eBay - Cassandra Summit 2013

eBay Site Data Infrastructure A heterogeneous mixture

Thousands of nodes > 2K sharded logical host > 16K tables > 27K indexes > 140 billion SQLs/day > 5 PB provisioned

Hundreds of nodes Persistent & in-memory > 40 billion SQLs/day

10+ clusters, 100+ nodes > 250 TB provisioned (local HDD + shared SSD) > 9 billion writes/day > 5 billion reads/day

Hundreds of nodes > 50 TB > 2 billion ops/day

Thousands of nodes The world largest cluster with 2K+ nodes

Dozens of nodes

Page 5: Cassandra at eBay - Cassandra Summit 2013

How do we scale RDBMS?

Shard

– Patterns: Modulus, lookup-based, range, etc.

– Application sees only logical shard/database

Replicate

– Disaster recovery, read availability & read scalability

Big NOs

– No transactions

– No joins

– No referential integrity constraints

5

Page 6: Cassandra at eBay - Cassandra Summit 2013

Why Cassandra?

Multi-datacenter (active-active)

Always Available - No SPOF

Easy to scale up & down

6

Write performance

Distributed counters

Hadoop support

Not replacing RDBMS, but complementing!

Some use cases don’t fit well in RDBMS - sparse data, big data, flexible schema, real-time analytics, …

Many use cases don’t need top-tier set-ups.

Page 7: Cassandra at eBay - Cassandra Summit 2013

Cassandra Growth

Au

g, 2

01

1

Au

g, 2

01

2

May

, 20

13

1

2

3

4

5

6

7

Billions (per day)

writes async. reads

sync. site reads

Terabytes

50

100

200

250

300

350 storage capacity

Doesn’t predict business

7

Page 8: Cassandra at eBay - Cassandra Summit 2013

eBay Use Cases on Cassandra Time-series data, real-time insights & immediate actions

• Fraud detection & prevention

• Quality Click Pricing for affiliates

• Order & shipment tracking and insights

• Mobile notification logging & tracking

• Cloud CMS change history storage

• RedLaser server logs and analytics

Server metrics collection for monitoring & alerting

Taste graph based next-gen recommendation system

Personalization Data Service

Social Signals on eBay Product & Item pages

Milo’s store-item availability inventory (evaluation phase)

8

Page 9: Cassandra at eBay - Cassandra Summit 2013

Real-time insights & actions for

9

Fraud Prevention Reporting

Quality Click Pricing More…

Page 10: Cassandra at eBay - Cassandra Summit 2013

10

System Overview

Business Event Stream

Checkout Shipping Refund & Recoup …

Order placed (bin/bid)

Paid Shipped Refunded

Raw

dat

a

Simple in-memory aggregations +/ Complex Event Processing +/

Cassandra’s distributed counters

Label printed per day per user User segmentation for affiliate pricing Orders per hour, …

Multiple Cassandra clusters

Payment

Act

in r

eal-

tim

e

Fraud Prevention

Affiliate Pricing Engine (eBay Partner Network)

Order tracking

Real-time reporting

… (Kept from several months to years)

Page 11: Cassandra at eBay - Cassandra Summit 2013

A glimpse on Data Model

11

Historic & real-time insights per user per carrier. Sudden & drastic change might be suspicious.

User bucketing based on historic & real-time buying activity.

Page 12: Cassandra at eBay - Cassandra Summit 2013

A glimpse on Data Model

12

Page 13: Cassandra at eBay - Cassandra Summit 2013

Fraud Detection & Prevention

13

Shop with Confidence

Page 14: Cassandra at eBay - Cassandra Summit 2013

System Overview

14

Cassandra

Fraud Detection & Prevention System

Sign

-in

in

fo

Business events (checkout, sell,…)

StaaS Oracle

Checkout Shipping … Payment Selling

Real-time Beacons data

Real-time Insights

Other data Machine

Learned Models

Page 15: Cassandra at eBay - Cassandra Summit 2013

15

A glimpse on Data Model

Collected at sign-in & stored as key-value.

Pulled periodically to StaaS for training machine learned models.

Page 16: Cassandra at eBay - Cassandra Summit 2013

Metrics collection for monitoring & alerting

16

Page 17: Cassandra at eBay - Cassandra Summit 2013

System Overview

17

Transport (HTTP, …)

Scalable NIO servers based on Netty

Thousands of production machines

Cassandra

Stats for CPU, Memory, Disk, ..

agent agent agent agent …

Server

Server

Server

Server

Server In-memory grid (hazelcast) for rollups

Page 18: Cassandra at eBay - Cassandra Summit 2013

A glimpse on Data Model

18

Granular data points

Rolled up metrics for various time intervals

Page 19: Cassandra at eBay - Cassandra Summit 2013

Taste graph based recommendation system

19

Page 20: Cassandra at eBay - Cassandra Summit 2013

Data Model

20

Tast

e G

rap

h

Tast

e V

ect

or

50 billion+ edges, 600 million+ writes, 3 billion+ reads, 30TB+ of data on SSD

Page 21: Cassandra at eBay - Cassandra Summit 2013

System Overview

21

Business Event Stream

Recommendation system

Taste Graph Taste Vector

1. Item purchased.

2a. Write purchase edge. 2b. Read other edges for this user & item.

4. Req. recommendations.

5. Finds other items close to user’s coordinates.

6. Reco. shown to user

More, http://www.slideshare.net/planetcassandra/e-bay-nyc

Page 22: Cassandra at eBay - Cassandra Summit 2013

Real-time Personalization Data Service

22

User performs search using keyword User gets personalized pages based on implicit/explicit profile

Page 23: Cassandra at eBay - Cassandra Summit 2013

System Overview

23

Personalization Data Service

CacheMesh (write-back cache)

Heavy writes

eBay site pages (personalized)

Every few mins

in-memory MySQL & XMP DB

Cassandra Oracle (scaled out)

Hea

vy r

ead

s

Cache miss

user profiles

Application SOA services (multiple)

Data Warehouse

Page 24: Cassandra at eBay - Cassandra Summit 2013

Data Model

24

• Keep column names short. • Don’t overload one CF with all the data:

- Split hot & cold data in separate CF. - Splitting & sharding can help compaction.

Static column families

Page 25: Cassandra at eBay - Cassandra Summit 2013

25

Served by Cassandra

Social Signals

Page 27: Cassandra at eBay - Cassandra Summit 2013

Multi-Datacenter Deployment

27

Topology - NTS RF - 1:1 or 2:2 or 3:3 Read CL - ONE/QUORUM Write CL - ONE

Data is backed up periodically to protect against human or software error

User request has no datacenter affinity

Non-sticky load balancing

Page 28: Cassandra at eBay - Cassandra Summit 2013

Multi-Datacenter Deployment

Topology - NTS RF – 1:1:1 or 2:2:2

Page 29: Cassandra at eBay - Cassandra Summit 2013

Lessons & Best Practices

• One size does not fit all

– Use Cassandra for the right use cases.

• Choose proper Replication Factor and Consistency Level

– They alter latency, availability, durability, consistency and cost.

– Cassandra supports tunable consistency, but remember strong consistency is not free.

• Many ways to model data in Cassandra

– The best way depends on your use case and query patterns.

• De-normalize and duplicate for read performance

– But don’t de-normalize if you don’t need to.

http://www.slideshare.net/jaykumarpatel/cassandra-data-modeling-best-practices

29

Page 30: Cassandra at eBay - Cassandra Summit 2013

Are you excited? Come Join Us!

30

Thank You @pateljay3001

#cassandra13