Download - MongoDB at eBay
MongoDB @ eBayOverview of strategy and use casesYuri FinkelsteineBay Platform [email protected]
May 2012
DB Scalability @ eBay eBay is one of the first and largest BASE
environments based on Oracle DB• Basic Availability• Soft-state• Eventual consistency
Every database we use is shared and partitioned• N logical hosts names are defined for each use case ahead
of time• These logical hosts are mapped to physical based on static
mapping tables which are controlled by DBAs• A common ORM framework called DAL provides powerful
and consistent patterns for data scalability
If the client provides a hint along with every DB query:
• DAL maps the hint to a logical host using one of N mapping schemes (ex: modulus, lookup table, range, etc)
• Logical host is then mapped to a physical using L-to-Ph map• The query is sent to just one shard
If the client does not have a hint, the query is sent to all shards and the results are joined on the client with the help of DAL framework
Side-effects:• Hint is not part of the query; client has to manage it• Logical to Physical mapping scheme becomes extra piece of
client configuration• Shard rebalancing is “DBA magic”
…Physical Master DB hosts
…Logical DB hosts(shards)
Applications
DAL Framework
Business Logic
Hint (shard key)
F1(Hint)
Config
DAL Framework
Business Logic
Hint (shard key)
…
F2(Hint)
…Physical standby DB hosts
App1 App2
Key desired improvements
All eBay site-facing applications use the scheme outlined above
It’s proven to scale to tens of thousands of developers, petabytes of data, hundreds of millions of SQL queries per day
But there is always room for improvements and new ideas• ORM is not the fastest way to develop; how do we achieve faster development cycles and reduce
schema mapping frictions?• How do we add new attributes to tables faster and without DBA’s involvement? Schema free approach
sounds interesting.• Can we make the hint transparent, ex: auto-extract it from queries?• Can we rebalance the data seamlessly and automatically?• Can we add shards faster in order to scale out on demand and transparently to applications?• How do we deploy new DBs to the cloud on demand?
And what about performance? Can we use RAM more aggressively and seamlessly to speed up queries?
Enters MongoDB
We are playing with MongoDB since 2010. Why?
Its scalability scheme is very similar to how we shard RDBMS
• Single master for writes, eventually consistent slaves for reads
• Horizontal partitioning of data sets is a norm at eBay• MongoS is performing familiar scatter-gather and client-
side merge-sorts
We don’t use distributed transactions since day 1; transactional updates of multiple tables that we do use can be simulated by atomic updates of a single Mongo document
MongoDB offers a number of features that help address our goals mentioned earlier:
• Developers love document model and schema-free persistence
• Hints are embedded into the queries• MongoDB has automatic shard rebalancing• Shards can be added on demand without application
restart and data will be auto-rebalanced• We can easily bring it up in the cloud since cloud
machines have storage
…
---------- Shards -------
…
…
<- R
eplicas ->
Morphia/Mongo Driver
Business Logic
MongoS
Dynamic Config
F(Shard Key)
Document
Case study #1: eBay Search Suggestions
Search suggestion list is a MongoDB document indexed by word prefix as well as by some metadata: product category, search domain, etc.
Must have < 60-70msec round trip end to end
MongoDB query < 1.4msec
Data set fits in RAM; 100-s M documents
Data is bulk loaded once a day from Hadoop, but can be tweaked on demand during sale promotions, etc
Single replica set, no shards in this case
MongoDB benefits:• Multiple indexes allow flexible lookups• In-memory data placement ensures lookup speed• Large data set is durable and replicated
Case Study #2: Cloud Manager “State Hub”
State Hub powers eBay Cloud
Every resource provisioned by the cloud is represented by a single Mongo document
Documents contain highly structured metadata reflecting roles and grouping of the resources
Lookup by both primary and secondary indexes
Several GB data sets, easily fit in RAM
Documents are not uniform
All resources have “State” field which is updated periodically to reflect health state of the underlying resource
Mixed workload: lots of in-place writes, but also lots of read queries
State Hub
Provision Resources
Query Resources and Topology
Update resource state
Mongo
Model 1
Case Study #3: eBay Merchandizing Info Cache
Merchandizing backend powers eBay product/item classification and categorization
Each MongoDB document represents a cluster of similar products
Numerous relationships between clusters are modeled as document attributes
Relationship hierarchy traversal is achieved by issuing a number of queries on “edge” attributes
Each instance of such a hierarchy is called a model; there are lots of models
Again, data set fits in RAM, single replica set
Replica set members are located in 3 different data centers (3+2+2) with all members in a single data center having higher weight to avoid moving master away
MongoDB benefits:• Schema-free design and declarative indexes are perfect for this use
case where new attributes and new queries are constantly being added
• Async replication across multiple data centers• MongoDB Java Driver ensures automatic detection of proximity
of clients to replica set members; reads with slaveOK=true are served from local data center nodes which insures low response latency
Model 1
Cluster1 Cluster2
Cluster3
R1
R2R3
Case Study #4: Zoom – Media Metadata Store
This is a new mega project which is a work in progress
MongoDB is being evaluated as a storage backend for all media-related metadata on the site (example: picture IDs with lots attributes)
Requirements:• Tens of TBs data set, Millions of documents: data set must be partitioned; this is our
first use case where MongoDB sharding is used• System of record for picture info; data can not be lost!• Replication/DR across 2 data centers; local DC reads are required• Queries are from site-facing flows; <10msec response time SLA• Mixed workload: both inserts and reads are happening concurrently all the time
Can MongoDB do it ??
Zoom: Data Model
2 main collections: Item and Image• Item references multiple Images
Item represents eBay Item:• _id in Item is external ID of the item in eBay site DB• These IDs are already sharded in balanced across N
logical DB hosts using ID ranges• We use MongoDB pre-split points for initial
mapping our N site DB shards to M MongoDB shards
• This ensures good balance between the shards;
Image represents a picture attached to an Item
• _id in Image is md5 of the image content• This ensures good distribution across any number of
shards• Md5 is also used to find duplicate images
Our choice of document IDs in both collections ensures good balance across Mongo shards
We never query both collections in a single service request to ensure data consistency and to have only one index lookup
Zoom: Service Topology and Configuration
MongoS is deployed on app servers• Ensures network IO on MongoS won’t become a bottleneck• This is a very familiar pattern in eBay as was explained in the
beginning of this presentation
M shards; each replica set has 6 members• 3 + 3 in 2 data centers• Master can be only in one DC during automatic failover; manual
failover may activate another DC• One slave in the secondary DC is invisible for reads and is
dedicated to periodic backups/snapshots (more on this later)
For reads, client first sets SlaveOK=true and if required document is not found flips to SlaveOK=false to read from Master
Home-grown MongoDB configuration and monitoring agent is running on every node
• Fetches MongoD configuration from a central configuration store and saves it to local config file
• Manages lifecycle of MongoD• Monitors state and metrics
M M MM
B B BB
---- Shards -----
--- R
eplicas --->
--- D
C1(P
rimary)---
>
-- DC
2(Secondary)-->
Zoom: Data Backup and Restore strategy
Goals:• Take periodic backups of the entire data set• Be able to recover from backup• Do not loose any writes that have happened after last snapshot • Briefly service unavailability during recovery is better than data
loss …
Dual writes on the client• Regular write to main cluster• Second write to another Mongo cluster: single replica set,
capped collection, the data written is similar to REDO log record
Hidden slave in each shard has volume mounted on a remote storage appliance capable of instant file system snapshot; captures both DB files and journal files
If DB recovery is activated:• All MongoD on primary cluster are shutdown• NFS slave is remounted to snapshot volume• MongoD on this machine is started as a master• MongoD on other replica set members are started cold• Full sync-up from master • Master is switched to a regular member• Writes that occurred since time when the backup was taken
are replayed from the REDO log capped collection in the secondary cluster
•
Application
M
B
…
Instant ShapshotCapable device
C
Dual-write to capped collection
Recovery Agent
Key Learning
MongoDB can be a very powerful tool but use it wisely
Deletes can be slow; automatic balancer is dangerous; use it only when you must (example: be careful when adding new shards)
Use explain for every query; disable full scans to discover inefficiencies early
Query profiler is great
Retry every failed query at least once; long tail in response times is possible when data set > RAM size
Questions?
Thank you!