espresso - shahnawaz saifi & kiran chand - devops bangalore meetup march 28th 2015
TRANSCRIPT
Recruiting Solutions Recruiting Solutions Recruiting Solutions
Espresso - distributed document store
By Shahnawaz Saifi & Kiran Chand – Site Reliability Engineering - DDS
Agenda
! Motivation ! Why Espresso? ! Data Model and API ! Architecture
2
Motivation
! Schema evolution ! Provision shards ! Data center failover ! Cost
Let’s Brew..
! Elasticity ! Consistency ! Distributed ! Fault Tolerant ! Secondary Indexing ! Schema Evolution ! Change Capture Stream ! Bulk Ingests
Data Model - Database
! database is a container for tables ! database schema contains important metadata about a database ! defines database traffic quotas e.g., read/write QPS, volume of data read/written etc
! A table is a container of homogeneously typed documents ! Every table schema defines a key-structure which can have multiple parts ! The key-structure defines how documents are accessed. ! Every fully specified key is the primary key for a single document ! The leading key in the table schema is also called the partitioning key.
Data Model - Table
Data Model and API
Partitioning
Document based data model
A fully specified key uniquely identifies a single document. A document schema is an Avro schema. Internally Espresso stores documents as Avro serialized binary data blobs. The "indexType" attribute implies that a secondary index has to be built on that field.
from : { name : "Chris", email : "[email protected]" }subject : "Go Giants!"body : "World Series 2012! w00t!"unread : true
Messages
mailboxID : StringmessageID : long
from : { name : String email : String }subject : Stringbody : Stringunread : boolean
REST based API
! Get a single message from Bob’s mailbox – GET /MailboxDB/Messages/bob/1
! Multi-GET several messages – GET /MailboxDB/Messages/bob/(1,2,3)
! Query for a page of unread messages
– GET /MailboxDB/Messages/bob/?query=“+isUnread:true” &start=0&count=15
! Write a new message PUT/MailboxDB/Messages/bobContent-Type: application/jsonContent-Length: 137{“from” : …, “subject” : …, “body” : …}
10
Partial Updates and Conditional Operations
11
! Mark a message as read (partial update)POST /MailboxDB/Messages/bob/1Content-Type: application/jsonContent-Length: 21
{“unread” : “false”}
! Get a message, only if recently updatedGET /MailboxDB/Messages/bob/1If-Match: Wed, 31 Oct 2012 02:54:12 GMT
Architecture
Generic Cluster Manager : Apache Helix
! Automatic assignment of resources and partitions to nodes ! Node failure detection and recovery ! Dynamic addition of resources ! Dynamic addition of nodes to the cluster ! Pluggable distributed state machine to manage the state of a
resource via state transitions ! Automatic load balancing and throttling of transitions ! Optional pluggable rebalancing for user-defined assignment of
resources and partitions ! More Info:
– http://helix.apache.org
Espresso state model
! Every partition must have only 1 master. Every partition can have up to 'n' configurable slaves.
! Partitions are distributed evenly across all storage nodes. ! No replicas of the same partition may be present on the same node. ! Upon master failover, one of the slaves must be promoted to master.
Router
! REST API ! Helix client (Spectator) ! Constructs storage node requests
Storage Node
! Query Processing ! Storage Engine ! Secondary Indexes ! Handling State Transitions ! Local Transactional Support ! Replication Commit Log ! Utility functions ! Scheduled Backups
Databus Databus is used for several purposes by Espresso: ! Deliver events to downstream consumers i.e., search indexes,
caches etc.. ! Espresso multi datacenter replication - each locally originated write is
forwarded on to remote data centers. This is discussed in more detail in the data replicator section.
Cross Colo Replication and ETL
! Data Replicator – forwards commits between geo-replicated Espresso clusters. – a Databus consumer that consumes events for each database
partition within a cluster. – contains a clustered set of stateless instances managed by Helix.
! Snapshot service