espresso - shahnawaz saifi & kiran chand - devops bangalore meetup march 28th 2015

19
Espresso - distributed document store By Shahnawaz Saifi & Kiran Chand – Site Reliability Engineering - DDS

Upload: devopsbangalore

Post on 18-Jul-2015

69 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Espresso -  Shahnawaz Saifi & Kiran Chand  - DevOps Bangalore meetup March 28th 2015

Recruiting Solutions Recruiting Solutions Recruiting Solutions

Espresso - distributed document store

By Shahnawaz Saifi & Kiran Chand – Site Reliability Engineering - DDS

Page 2: Espresso -  Shahnawaz Saifi & Kiran Chand  - DevOps Bangalore meetup March 28th 2015

Agenda

!  Motivation !  Why Espresso? !  Data Model and API !  Architecture

2

Page 3: Espresso -  Shahnawaz Saifi & Kiran Chand  - DevOps Bangalore meetup March 28th 2015

Motivation

!  Schema evolution !  Provision shards !  Data center failover !  Cost

Page 4: Espresso -  Shahnawaz Saifi & Kiran Chand  - DevOps Bangalore meetup March 28th 2015

Let’s Brew..

!  Elasticity !  Consistency !  Distributed !  Fault Tolerant !  Secondary Indexing !  Schema Evolution !  Change Capture Stream !  Bulk Ingests

Page 5: Espresso -  Shahnawaz Saifi & Kiran Chand  - DevOps Bangalore meetup March 28th 2015

Data Model - Database

!  database is a container for tables !  database schema contains important metadata about a database !  defines database traffic quotas e.g., read/write QPS, volume of data read/written etc

Page 6: Espresso -  Shahnawaz Saifi & Kiran Chand  - DevOps Bangalore meetup March 28th 2015

!  A table is a container of homogeneously typed documents !  Every table schema defines a key-structure which can have multiple parts !  The key-structure defines how documents are accessed. !  Every fully specified key is the primary key for a single document !  The leading key in the table schema is also called the partitioning key.

Data Model - Table

Page 7: Espresso -  Shahnawaz Saifi & Kiran Chand  - DevOps Bangalore meetup March 28th 2015

Data Model and API

Page 8: Espresso -  Shahnawaz Saifi & Kiran Chand  - DevOps Bangalore meetup March 28th 2015

Partitioning

Page 9: Espresso -  Shahnawaz Saifi & Kiran Chand  - DevOps Bangalore meetup March 28th 2015

Document based data model

A fully specified key uniquely identifies a single document. A document schema is an Avro schema. Internally Espresso stores documents as Avro serialized binary data blobs. The "indexType" attribute implies that a secondary index has to be built on that field.

from : { name : "Chris", email : "[email protected]" }subject : "Go Giants!"body : "World Series 2012! w00t!"unread : true

Messages

mailboxID : StringmessageID : long

from : { name : String email : String }subject : Stringbody : Stringunread : boolean

Page 10: Espresso -  Shahnawaz Saifi & Kiran Chand  - DevOps Bangalore meetup March 28th 2015

REST based API

!  Get a single message from Bob’s mailbox –  GET /MailboxDB/Messages/bob/1

!  Multi-GET several messages –  GET /MailboxDB/Messages/bob/(1,2,3)

!  Query for a page of unread messages

–  GET /MailboxDB/Messages/bob/?query=“+isUnread:true” &start=0&count=15

!  Write a new message PUT/MailboxDB/Messages/bobContent-Type: application/jsonContent-Length: 137{“from” : …, “subject” : …, “body” : …}

10

Page 11: Espresso -  Shahnawaz Saifi & Kiran Chand  - DevOps Bangalore meetup March 28th 2015

Partial Updates and Conditional Operations

11

!  Mark a message as read (partial update)POST /MailboxDB/Messages/bob/1Content-Type: application/jsonContent-Length: 21

{“unread” : “false”}

!  Get a message, only if recently updatedGET /MailboxDB/Messages/bob/1If-Match: Wed, 31 Oct 2012 02:54:12 GMT

Page 12: Espresso -  Shahnawaz Saifi & Kiran Chand  - DevOps Bangalore meetup March 28th 2015

Architecture

Page 13: Espresso -  Shahnawaz Saifi & Kiran Chand  - DevOps Bangalore meetup March 28th 2015

Generic Cluster Manager : Apache Helix

!  Automatic assignment of resources and partitions to nodes !  Node failure detection and recovery !  Dynamic addition of resources !  Dynamic addition of nodes to the cluster !  Pluggable distributed state machine to manage the state of a

resource via state transitions !  Automatic load balancing and throttling of transitions !  Optional pluggable rebalancing for user-defined assignment of

resources and partitions !  More Info:

–  http://helix.apache.org

Page 14: Espresso -  Shahnawaz Saifi & Kiran Chand  - DevOps Bangalore meetup March 28th 2015

Espresso state model

!  Every partition must have only 1 master. Every partition can have up to 'n' configurable slaves.

!  Partitions are distributed evenly across all storage nodes. !  No replicas of the same partition may be present on the same node. !  Upon master failover, one of the slaves must be promoted to master.

Page 15: Espresso -  Shahnawaz Saifi & Kiran Chand  - DevOps Bangalore meetup March 28th 2015

Router

!  REST API !  Helix client (Spectator) !  Constructs storage node requests

Page 16: Espresso -  Shahnawaz Saifi & Kiran Chand  - DevOps Bangalore meetup March 28th 2015

Storage Node

!  Query Processing !  Storage Engine !  Secondary Indexes !  Handling State Transitions !  Local Transactional Support !  Replication Commit Log !  Utility functions !  Scheduled Backups

Page 17: Espresso -  Shahnawaz Saifi & Kiran Chand  - DevOps Bangalore meetup March 28th 2015

Databus Databus is used for several purposes by Espresso: !  Deliver events to downstream consumers i.e., search indexes,

caches etc.. !  Espresso multi datacenter replication - each locally originated write is

forwarded on to remote data centers. This is discussed in more detail in the data replicator section.

Page 18: Espresso -  Shahnawaz Saifi & Kiran Chand  - DevOps Bangalore meetup March 28th 2015

Cross Colo Replication and ETL

!  Data Replicator –  forwards commits between geo-replicated Espresso clusters. –  a Databus consumer that consumes events for each database

partition within a cluster. –  contains a clustered set of stateless instances managed by Helix.

!  Snapshot service

Page 19: Espresso -  Shahnawaz Saifi & Kiran Chand  - DevOps Bangalore meetup March 28th 2015