case study: using mongodb for an e-commerce platform
DESCRIPTION
Provides a start to finish overview of how to implement MongoDb for an e-commerce platform. It includes the data schema design, best practices and considerations, information about deployment and some of the operational proceduresTRANSCRIPT
CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM
VERSION 1.0: JULY 19, 2011
AUTHOR: HENNIE GROBLER ([email protected])
CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM
Overview..........................................................................................................................................4
Scope..........................................................................................................................................4
Sources....................................................................................................................................... 5
System Definition.............................................................................................................................6
Use Cases...................................................................................................................................6
Constraints and assumptions:......................................................................................................7
Define the Schema...........................................................................................................................7
Identify System Operations..........................................................................................................8
Identify Entities and Fields...........................................................................................................9
MongoDb Best Practices and Considerations................................................................................10
Entity Relationships...................................................................................................................10
Size of Data...............................................................................................................................10
Indexing ....................................................................................................................................10
Adding indexes......................................................................................................................11
Filter Criteria.....................................................................................................................11
Sorting..............................................................................................................................11
Considerations......................................................................................................................11
Query Optimization................................................................................................................12
Sharding....................................................................................................................................12
Automatic Sharding...............................................................................................................12
Sharding Key.........................................................................................................................12
Considerations......................................................................................................................13
Using the _id (or date based data) as the shard key.........................................................13
Read / Write Ratio............................................................................................................13
Related Data.....................................................................................................................14
Unique Keys.....................................................................................................................14
Result Order.....................................................................................................................14
Bringing it all together.....................................................................................................................15
Entities.......................................................................................................................................15
Product..................................................................................................................................15
Category...............................................................................................................................16
User...................................................................................................................................... 16
Shopping Cart.......................................................................................................................17
Actions.......................................................................................................................................18
Search for product based on SKU.........................................................................................18
Page 2 of 32
CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM
Search for products by product name...................................................................................18
Search for products by category identifier.............................................................................18
Increment / decrement stock item.........................................................................................19
Add / Edit products................................................................................................................20
Create Shopping Cart............................................................................................................21
Problem............................................................................................................................21
Define the correct shard key.............................................................................................21
Split read and write data...................................................................................................22
Add / Remove products to / from shopping cart.....................................................................22
Pay for cart by credit card.....................................................................................................22
Search for all categories........................................................................................................23
Search for products less than reorder threshold....................................................................23
Search for sub-categories by category identifier....................................................................24
Search total product value.....................................................................................................26
Search cart total per date......................................................................................................26
Discard Shopping Cart..........................................................................................................27
Infrastructure..................................................................................................................................28
Deployment................................................................................................................................28
Mongo Processes.................................................................................................................29
Replica Sets[12]....................................................................................................................29
Operating System......................................................................................................................29
RAM........................................................................................................................................... 29
Network.....................................................................................................................................30
Next Steps......................................................................................................................................30
References.....................................................................................................................................31
Page 3 of 32
CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM
Overview
MongoDb garnered much attention over the last couple of years. It is said to be fast and reliable
and that it automates some of the processes that are usually very time consuming and error prone.
Adoption seems to be growing steadily as it is being used in more and more, high transaction
volume systems like Foursquare, Bit.ly and Sourceforge.
MongoDb seemed like the 'way to go' but then some reports of down time surfaced as was the
case with Foursquare (MongoDB Auto-sharding and Foursquare Downtime[21]) and I realised that it
is not a 'quick fix' solution that can be applied to all scenarios.
Financial systems seemed to be the most unsuitable type of application to use with a MongoDb
back-end. I am still not 100% convinced that MongoDb can be used with all types of financial
systems, especially not banking systems, but I believe that it may be suitable for most e-commerce
systems.
I found the following factors to be most obvious issues with starting a MongoDb implementation:
• Schema Design: The schema design used for MongoDb and MySql implementations are
vastly different but because developers are generally used to designing for relational
databases they are prone to make some bad design decisions.
• Sharding: MongoDb has many built-in features that reduce the operational procedures that
must be in place, but not understanding how these features work could cause some serious
system problems.
• Experience: MongoDb is a relatively new technology compared to its relational
counterparts like MySql which means that there is an equally limited amount of experienced
MongoDb developers and administrators in the field.
This document tries to solve the above mentioned issues somewhat, by providing an overall
overview of an imaginary e-commerce system built on MongoDb, instead of the numerous
disjointed examples found on the internet.
Scope
The document covers the creation of the data schema for the e-commerce system, and provides
an overview of the infrastructure and some of the operational procedures that must be in place to
get started with a MongoDb implementation. It does not however discuss the actual e-commerce
website implementation.
Page 4 of 32
CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM
We will assume that the system has a limited amount of functionality as defined in subsequent
sections. This will provide a set of parameters for the use case and avoid an overly complex design
that could be confusing and therefore hide some of the learning's that can be taken away from it.
Sources
This document is based on theoretical knowledge of the topic but all statements, conclusions and
examples therein is based on information found on the MongoDb site, other use cases and various
blogs that are freely available on the internet.
All sources are noted at the end of the document. It is recommended that these additional
resources also be assimilated in order to get the maximum benefit from this document.
Page 5 of 32
CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM
System Definition
Based on what we have been taught about relational database design there is only one correct
design for a given problem. The approach would normally be to analyse the data, identify all the
prominent entities that are represented by the data, create a table for each and then create the
appropriate relationships between the tables.
Once all of the data normalization (sometimes de-normalization) rules have been applied the
design was done. With MongoDb databases this process differs slightly as the data schema cannot
be designed without first evaluating what the system will do with the data.
Use Cases
The system will be limited to the following use cases:
• A user can
1. register on the site
2. log in on the site with username (email) and password
3. view products from a specific category
4. search the product list based on the name of the product
5. view a specific product
6. add n number of different products to a shopping cart
7. remove products from a shopping cart
8. can discard a shopping cart
9. can pay for a shopping cart by credit card
• The system must
10. track product stock levels
• An accountant can view the following reports:
11. Total daily, monthly and yearly income earned from online sales ordered by date
12. Total value of stock on hand
• An inventory clerk can:
13.Add / Edit Products
14.Set inventory stock level order threshold per product (When an order must be placed
otherwise shop will run out of stock)
Page 6 of 32
CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM
Constraints and assumptions:
• Email addresses are unique
• Product Identifiers are unique
• A user must be logged in to be able to make a payment
• Shopping Cart
◦ Limited to 500 line items.
◦ Each line item will be a unique product. It a product is added to the cart that already
exists, then the original order item quantity will increase with the amount of the new line
item
• A category can be a sub-category of another category
• Passwords saved to the database must be made up of a cryptographic hash of the
password with an added salt value (random value)
• The stock levels of a product is only adjusted when new stock is added to the inventory and
only removed once an item is added to a cart and that cart is successfully paid
• The system supports user roles (User, Accountant, Inventory Clerk) where each role has
access to different functionality
Define the Schema
This case study will use the following steps to identify the final data schema:
1. Identify the operations that the system need to support, based on the system functionality
2. Identify the entities that the operations 'interact' with
3. Identify meta-data of the entities
4. View how the entities are used in the system in relation to one another
5. Bring it all together by using the findings from the first four steps and applying some best
practice rules to them
Page 7 of 32
CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM
Identify System Operations
The following actions were identified based on the previously defined functionality and is ordered
into probability of a possible usage scenario. Order is for demonstration only and may vary
depending on actual implementation.
The table also shows which system function the action relates to, the type of operation and which
potential entities and fields were identified.
Action SystemFunction
Type Subject(s) / Fields
1 Search for product based on SKU 5 Read Product (SKU)2 Search for all categories 3 Read Category3 Search for products by category identifier 3 Read Product / Category
(id)4 Search for sub-categories by category
identifier
3 Read Category (id,
parent_id)5 Search for products by product name 4 Read Product (name)6 Create shopping cart 6 Write Cart7 Add / remove products to shopping cart 6,7 Write Cart (line items)8 Pay for cart by credit card 9 Write Cart, Payment (credit
card info)9 Increment / decrement stock item 11 Write Product
(items_in_stock)10 Find user by email (not by password as well
as salt must be returned to calculate correct
password)
2 Read User (email,
password, salt)
11 Save new / existing user (similar to Add /
remove products from shopping cart so will
be discarded)
1 Read User
12 Add / Edit products 14 Write Product13 Search for products less than reorder
threshold
15 Read Product
(reorder_threshold)14 Search cart total per date (ordered) 12 Read Cart (date, total)15 Search total product value 13 Read Product (cost_price)16 Discard shopping cart 8 Delete Cart
Page 8 of 32
CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM
Identify Entities and Fields
The previous section identified the different system entities and also identified some fields. We will
now expand on this by reviewing the constraints and assumptions. We will also add some
additional attributes that will probably be required by a real system to make this example more
complete.
Entity FieldsProduct name, SKU, cost_price, selling_price, items_in_stock, reorder_thresholdCategory id, parent_id, list of productsCart date, totalCart Line Item product info, quantityPayment credit card infoUser firstname, lastname, email, password, salt, shipping address, role (user,
accountant, inventory clerk)
Page 9 of 32
CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM
MongoDb Best Practices and Considerations
Entity Relationships
Each of the entities will most probably be modelled as individual tables in a relational database but
this may not necessarily be the case with a MongoDb database. One of the biggest factors in
deciding how the data is modelled depends on how the entities are accessed in relation to one
another.
For example, if an invoice and its line items are always accessed together then it would be better
for performance to model them as one entity. Alternatively if line items are regularly accessed
individually, then it would probably be better to model them as separate entities.
For example, based on the current use case we will model the Shopping Cart and Cart Line Items
as one document.
Size of Data
The maximum size of a document in MongoDb is currently limited to 8 MB but a maximum size of
32 MB has been proposed and this will probably increase even further in future. It may sound like
good idea to store very large objects in a document but consider that the whole document must
travel across the network between the database server and the application server when it is
accessed.
In cases where only part of the document is accessed each time it is retrieved it would be less
resource intensive if the document is split into smaller documents.
Indexing
Adding indexes to your collections could significantly increase the query performance as MongoDb
can quickly navigate the index to find the relevant document by key instead of scanning each
document in the collection.
The following shows a simplified depiction of how the system is able to navigate the index to find
the relevant information (in this case the user with the surname of Straub) without having to scan
each and every document in the collection.
Page 10 of 32
CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM
MongoDB automatically creates an index on the _id column but additional indexes can be added
as required.
Adding indexes
Filter Criteria
The fields that indexes are applied to depend on the queries that are completed. In our use case
the system will 'Search for products based on SKU' so we can therefore define an index on the
SKU field of the document.
Sorting
Based on the 'Search cart total per date (ordered)' system action we would also need to add an
index on the date as the query is sorted by date. Adding an index on the field that is sorted on
enables MongoDb to sort the data without having to open each document.
Considerations
The following must be taken into consideration when applying indexes:
• Additional Overhead: Values are added / removed from an index whenever documents
are added/removed to/from the collection. This does not pose a problem in systems that do
mostly read operations but in write heavy systems this may incur significant overhead as
the index must be continuously updated.
• Initial Index Blocking: No queries can be done against the database when the index is
first applied except when using {background:true} option[9].
Page 11 of 32
King
Harris Rice
StraubKoontzBachman Graham
CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM
• Case Sensitive: MongoDb indexes are case sensitive
• Indexes per Collection: There is a limit of 40 indexes per collection. In most cases this
number is more than sufficient.
• Index Key Size: Currently a maximum key length that can be indexed, is 800 bytes.
Query Optimization
As with applying indexes on a relational database, you sometimes get unexpected results so it is
good practice to verify that the query uses the intended index and that using the index actually
results in better performance. This can be done by examining the query execution plan by issuing
the explain()[10] command.
Sharding
Automatic Sharding
MongoDb supports automatic sharding[1] where data is automatically spread out across multiple
servers in order to distribute the transaction load. The system accomplishes this by storing data in
multiple files (called chunks[2]) across multiple servers. Each chunk can be up to a maximum of 200
MB in size by default but can be overridden to be larger.
Once a chunk reaches approximately 50%-75% (100 MB to 150 MB) of the maximum size,
MongoDb will create a snapshot of the chunk and copy the snapshot data to the new chunk. Writes
can still be done to the original chunk while this copy operation is in process. Once the copy
process is completed, the changes made to the original chunk will be applied to the new chunk
before it is made available.
Sharding Key
Mongo Db uses a key called a shard key to decide to which chunk, data will be allocated. The
shard key will by default be based on the _id column that is made up of a BSON object (see BSON
ObjectId Specification[3]) but this can be overridden by user code to consist of any user defined
value.
A shard key for user document could for example be based on the user last name. With that in
mind imagine that we have three chunks with user data. The first chunk may contain all the users
that have a surname starting with B to H, the second Ki to Ko and the third chunk R to S.
Page 12 of 32
CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM
If a user with a last name of Barker is added, it will be written to the first chunk where a user with a
last name of Smith will be written to the last chunk.
Considerations
Deciding on the correct shard key may be one of the most significant design decisions that are
made during the design process as it could have a major impact, positive or negative, on system
performance. The following are some considerations to note.
Using the _id (or date based data) as the shard key
MongoDb automatically adds an _id attribute to each document (if not overridden by application
code) and populates it with a unique value (see BSON ObjectId Specification [3]). The BSON object
consist of a couple of values that are concatenated together to form a (relatively) unique value. The
first part of this unique value is calculated based on the current date and time.
This could be an advantage as data is automatically stored in date order which would increase
performance of queries that query data by date range or need to order results by date. This fact
can also be exploited in other ways. For example most drivers support extracting the creation date
and time from the _id which means that storing a 'created at' value in the document is not required.
On the other hand, based on the MongoDb website it could also have some implications on
scalability. At the beginning of each month documents will be written to the same server until the
data chunks are migrated across to other servers. This issue can mitigated by adding some
uniqueness to the key and pre-splitting chunks[7].
Read / Write Ratio
The read / write ratio that the system will experience must also be carefully considered. If the
system experiences many reads it would be better for performance if the whole query can be
satisfied from one shard and preferably one document. Alternatively if the system experiences
many writes it would be better if the shard keys are defined in such a way that the writes are
distributed between multiple servers in order to spread the workload. This can be achieved by
adding more uniqueness to the shard key.
Page 13 of 32
King, Stephen………
Koontz, Dean
Rice, Anne………
Straub, Peter
Bachman, Richard………
Harris, Thomas
CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM
If the system experience exceptionally many writes then the way that the MongoDb balancer
handles the splitting of chunks could also become an issue as described in the 'MongoDB Pre-
Splitting for Faster Data Loading and Importing'[8] article.
Related Data
Keeping related data close together will improve system performance as all the data can be
retrieved from one chunk or shard. In a system with lots of user related content we may prefix the
shard key with the user id. We could 'force' the system to store different documents containing user
related information like personal data, uploaded media and purchase history close together by
prefixing each document _id with the particular user id.
Unique Keys
Shard keys should normally be as unique as possible. MongoDb can only shard data if the key can
be split into smaller parts. Depending on the system, there may some performance issues that
start appearing once chunks start to grow past the default 200 MB maximum size.
For example using State (eg. Texas and Ohio) as the shard key for user related data may cause
some problems in the future as MongoDb will have to write data for ALL users that live in a
particular state to the same chunk and because it cannot split the chunk it would grow to be very
large.
If the key is changed to include City it would allow MongoDb to create a chunk for each State+City
combination which allows for a lot more granularity.
If it is also considered that each State+City chunk is potentially stored on a different server and that
some cities have more users than others, it becomes clear that some servers will experience
higher loads than others.
Result Order
The order in which search results are returned to the client can also affect the selection of an
appropriate shard key. Continuing with the State / City example let us imagine that we defined a
shard key of {state:1,city:1} on our data and that the relevant data returned by a query is
stored on multiple servers.
• If the query returns data ordered by city, each server will need to compile the search results
and then sort the data. The data is then returned from each server and then the results are
merged into one by the mongos process (See Deployment section). The extra sorting step
has to be completed as there is not an index defined on the city column alone but on the
Page 14 of 32
CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM
combination of State+City.
• If the query on the other hand sorts by state or state+city then each server will compile the
data and stream it back in order to the mongos process without having to sort and merge
the results as it will be able to utilise the defined index.
Bringing it all together
After reviewing the system functionality as well as some of the best practices and considerations
we are able to create our document schemas and define the queries that will be run against the
system.
Entities
Based on the 'Identify Entities and Fields' section we can assume that the documents would
resemble the following samples. The structure and content of these documents may change further
as the different actions are considered in the following section.
Product
Each product document will have the following structure and will be allocated to the products
collection. Categories will also be stored in the product document but will be discussed in detail in
a subsequent section.
Collection: products{ "_id": ObjectId("4e1b091559a4f01109000000"), "name": "Ipad", "sku": "10001-23424-9098", "cost_price": 300, "selling_price": 320, "items_in_stock": 9, "reorder_threshold": 10}{ "_id": ObjectId("4e1b08e159a4f01608000000"), "name": "Ipod Nano", "sku": "10001-23424-9098", "cost_price": 100, "selling_price": 120, "items_in_stock": 10,
Page 15 of 32
CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM
"reorder_threshold": 15}
Category
Category documents will be allocated to the categories collection and will not be sharded as all the
category documents will make up a relatively small amount of data. We will also override the
default generated _id as it is very long. The reason for this will be explained later on. Categories
will fortunately not be updated often which means that the performance hit of using a custom
incremental _id for categories, is acceptable
Collection: categories{ "_id": "1", "name": "Electronics", "subcats": [2, 3]}{ "_id": "2", "name": "Cellular", "parents": [1], "subcats": [3] }}{ "_id": "3", "name": "Nokia", "parents": [1, 2 ] }}
User
User documents will be allocated to their own collection called users
Collection: users{ "_id": ObjectId("4e1bfba789a4f02207000000"), "firstname" : "John", "lastname" : "Doe", "email" : "[email protected]", "password" : "[encrypted_text]", "password_salt" : "[salt_text]" "shipping_address" : {
Page 16 of 32
CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM
"address1" : "33 Rainbow Road","city" : "Cape Town","postal_code" : "8000"},
"role" : "user"}
Shopping Cart
The shopping cart, products in the cart and the payment made for the cart will always be queried
together which means that the data can be stored as one document. Each of the line items will
become an array item in the document. Some of the product data was duplicated into the cart
object which prevents additional database lookups when completing actions like previewing the
cart or generating an invoice or even reprinting an invoice a year after it was paid for.
The payment details and some of the user details will also be stored in the document.
Collection: cart
{ "_id": ObjectId("4e1bfba559a4f02207000000"), "line_items": [{ "_id": "1_4e1b091559a4f01109000000", "cost_price": 300, "name": "Ipad", "selling_price": 320, "sku": "10001-23424-9098", "qty": 2 }, { "_id": ObjectId("4e1b08e159a4f01608000000"),
"cost_price": 100, "name": "Ipod Nano", "selling_price": 120, "sku": "10001-23424-9098",
"qty": "1", }], "payment": { "card_number": "[encrypted_text]", "expiry": "11\/12", "card_holder": "Mr J Doe" },
Page 17 of 32
CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM
"sales_date": "2011-07-12 09:45:36", "total": 760, "user": {
"id": ObjectId("4e1bfba789a4f02207000000"), "name" : "John Doe",
"email" : "[email protected]", "shipping_address" : {
"address1" : "33 Rainbow Road","city" : "Cape Town","postal_code" : "8000"
},"role" : "user"
}}
Actions
The actions are not ordered as defined in the 'Identify System Operations' section as some of the
discussions build one previous ones.
Note: All of the following examples refer to the document examples defined in the 'Entities' section unless otherwise specified.
Search for product based on SKU
Add an index on the SKU field of the product document
Search for products by product name
Add an index on the name field of the product document
Search for products by category identifier
Based on one of the best practices it is better to combine all the information related to a specific
entity into one document so that the system can satisfy the query without having to retrieve
multiple documents.
That would suggest that we save all of the products into the specific category document. In a
normal e-commerce system we will have hundreds or thousands of products over time which will
result in very large documents.
Page 18 of 32
CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM
We could opt to model the product and category entities as separate documents which means that
these documents should somehow reference each other.
In our design we will add the category _id to the product document like this:
{ "_id": ObjectId("4e1b091559a4f01109000000"), "name": "Ipad", .... "category" : “10”}
We could then add an index on the category column in order to quickly find all products in a
particular category.
We could alternatively embed the whole category document in the category field if required. This
approach would take more disk space because of the duplicated data but if the category data
needs to be displayed on the front end with category information it could prevent an extra query to
the database. This may only be an option if the category information is relatively static.
In cases where a product can belong to a multiple categories we could use an array of category
id's.
{ "_id": ObjectId("223b091559a4f01109000000"), "name": "Nokia", .... "categories": ["1": "2"] }}Querying for a specific value in an array field is supported by MongoDb with the Multikey feature[13].
Increment / decrement stock item
The items_in_stock field will in essence be a counter that is incremented or decremented when an
item is added to stock or sold. In this case the system does not need to return the document to the
client. The system is able to increment / decrement the document in place.
Updating a document[14] will normally take this form:
var product = prodCollection.findOne({_id: “4e1b091559a4f01109000000”});product.items_in_stock++;prodCollection.save(product);
Page 19 of 32
CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM
But we can use a modifier[15] which is much more efficient and can be used for atomic updates [16]
on the document. We will most probably query for a product by _id which automatically has an
index defined on it.
Use the following to increment the items_in_stock without retrieving the whole document (note the
$inc operator):
db.products.update ( { _id : ObjectId( "497ce4051ca9ca6d3efca323" ) }, { $inc: { items_in_stock : 1}});
or the following to decrement the stock level:
db.products.update ( { _id : ObjectId( "497ce4051ca9ca6d3efca323" ) }, { $inc: { items_in_stock : -1}});
Add / Edit products
The most important aspect when editing data is deciding on the shard key as this will influence
which shard the data will be written to and how the data will be located during queries.
Adding and editing products will not happen that often in comparison to other types of transactions
which means that the default _id should be sufficient to be used as the shard key. But considering
that searching for products by category is a high volume transaction we could concatenate the
category _id to the product id so that all products in a category are grouped together as shown in
the following example:
{ "_id": "1", "name": "Electronics"}
{ "_id": ObjectId("1_4e1b091559a4f01109000000"), "name": "Ipad", .... "category" : "1"}
Another side effect of pre-pending the category for systems where a product can only belong to
one category, is that we potentially do not have to store the category as a separate field as it can
Page 20 of 32
CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM
be extrapolated from the product _id.
Create Shopping Cart
Problem
As with the product entity, careful consideration is required when deciding what the shopping cart
shard key will consist of. In a high transaction volume environment there will be a tremendous
amount of writes completed as new shopping carts are created and items are added and removed.
Then once the cart is paid, it will be mostly read from for reporting purposes etc.
This makes it difficult as applying indexes for example, will allow for fast retrieval of the data after
payment but will hurt performance while the purchase is in progress. Also choosing a shard key
related to date will allow for better querying of the data but will be dangerous as it could mean that
all writes will be done to the same shard instead of being spread out over many shards.
Completing regular data intensive queries for reports etc. could also hurt system performance and
potentially affect the user experience.
Define the correct shard key
In our case we will avoid using the default generated _id as it will cause excessive writes to one
server at some times during the month. A similar issue was described in the last paragraph of the
'Using the _id or date based data as the shard key' section.
There are many different ways to generate a unique number that can be used for the _id of your
document. Most approaches combine a couple of values to get a unique value. In some systems it
may be sufficient to concatenate the user id and the date. We could even be more inventive and
use the application server name that the transaction was generated on or even use the
hexadecimal representation of the user's IP address[19] (eg. IP 196.134.96.111 = hex C4 86 60 6F)
to help make values unique.
In our use case we will keep it simple and use a GUID [20] for a unique key. We could also pre-split
chunk[7] data if necessary. This will ensure that write operations to the cart is distributed across
many shards.
Adding and removing line items to/from the cart can be done most efficiently by using the $push
and $pull[14] modifiers that will add items to the document in place. Because an index is
automatically added to the _id field finding the documents by _id will also be fast.
And lastly, once a cart is paid we can use the $set[14] modifier to add payment details.
Page 21 of 32
CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM
Split read and write data
Data volumes in this collection will eventually grow very large and may affect performance. This is
especially true if it is considered that reporting queries and other search queries will be done
against the same database. We will therefore use two collections, one for 'active' carts and another
for 'completed' carts. The active cart collection will have no additional indexes in order to cater for
the frequent updates whereas the completed cart collection will have more indexes to cater for the
different search queries.
Moving data between collections will cause extra overhead on the system so we will split this
processing into different parts. We will assume that real time (or as close to as possible) reporting
is required which means that we cannot use a deferred job that will move the data during a low
transaction volume period like midnight to 3 AM.
There are various approaches but we will go with a more complex option in order to demonstrate
some MongoDb less well knows features.
When a cart is paid and payment details are saved we will add an additional field called
'processed' using the $set[14] modifier. This field will have a sparse[5] index defined on it. Sparse
indexes only include documents that contain the field that the index is defined on.
A separate server process will query the database at intervals and retrieve all the documents that
have a 'processed' field in the document. Because of the sparse index it will be a very efficient
query and will not affect write queries as only documents containing the 'processed' field will be
included in the index. These documents will be retrieved and saved into the second collection and
once the document is moved the 'processed' field will be removed from the original. Because the
field is removed that document will not be returned on subsequent 'data move' queries. Care needs
to be taken to ensure that this both updates happen in an atomic[16] fashion.
A third process will be run during low transaction volume period. This job will remove all documents
from the first collection that exist the second.
Add / Remove products to / from shopping cart
See 'Create shopping cart' section.
Pay for cart by credit card
See 'Create shopping cart' section.
Page 22 of 32
CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM
Search for all categories
Due to the static nature of the category data it would most probably be cached on the client
application server instead of being queried for continuously. This means that nothing additional
would be required for this query except possibly an index on the category name if the result of the
'Search all categories' query must be sorted by name.
Search for products less than reorder threshold
Finding documents where the value of a field is less than another value can be completed with the
first query below but MongoDb does not support using the $lt modifier with a column name yet, as
shown in the second query.
> db.products.find({items_in_stock: {$lt:20}}){ "_id" : ObjectId("4e1b091559a4f01109000000"), "items_in_stock" : 9, "name" : "Ipad", "reorder_threshold" : 10 }> db.products.find({items_in_stock: {$lt:reorder_threshold}})Mon Jul 11 14:43:28 ReferenceError: reorder_threshold is not defined (shell):0
We are able to make use of a mapreduce[17] function though.
In this use case the query will access all the product documents in the collection, it does not have
any filter criteria and does not require sorting which makes it a good option for map-reduce The
following example is adapted from the 'Finding Max And Min Values for a given Key' article[18].
Based on the example data (Entities section) the result is expected to look like this:
{ _id : "1_497ce4051ca9ca6d3efca323", value : { product : { name : “Ipod Nano” , items_below_level : 5 } } }{ _id : "1_678ce4051ca9ca6d3efca323", value : { product : { name : “Ipad” , items_below_level : 1 } } }
Explaining map / reduce is out of scope of this document but suffice it to say that the functions are
applied to each document. Our map function would check whether the items in stock for a
particular product, are below the set threshold, and if it is, it will emit the value. The reduce function
will normally be used to aggregate values (eg. sums, counts and averages) but in our case not, so
the function just returns the result.
> map = function () { if (this.items_in_stock < this.reorder_threshold) {
Page 23 of 32
CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM
var x = {name:this.name, items_below_level:(this.reorder_threshold - this.items_in_stock)}; emit(this._id, {product:x}); }}> reduce = function (key, values) { return values[0];}
Running the mapReduce command will have the following output:
> db.products.mapReduce(map, reduce, {out:{inline : true}});{ "result" : "tmp.mr.mapreduce_1310385961_11", "timeMillis" : 5, "counts" : {
"input" : 2,"emit" : 2,"output" : 2
}, "ok" : 1,}> db.tmp.mr.mapreduce_1310385961_11.find(){ "_id" : ObjectId("4e1add3b59a4f0d213000000"), "value" : { "product" : { "name" : "Ipod Nano", "items_below_level" : 5 } } }{ "_id" : ObjectId("4e1add5c59a4f04906000000"), "value" : { "product" : { "name" : "Ipad", "items_below_level" : 1 } } }
Search for sub-categories by category identifier
As mentioned under 'Search for all categories', the category / sub-category hierarchy will most
probably be retrieved, calculated and cached in some form or another on the client side which
means that we do not need to make any changes to accommodate this query.
In a scenario where caching is not possible, we could use a map-reduce [17] function to return the
appropriate category hierarchy.
In our use case assume that categories have sub-categories and that sub-categories can have
their own sub-categories as shown in the example data.
We can use the following map-reduce functions to retrieve the data in the appropriate category
hierarchy.
> map = function () {
Page 24 of 32
CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM
var key = {id:this._id, name:this.name};if (!this.subcats) {
var value = {subcats:['none']};emit(key, value);
}else {
for (var i = 0; i < this.subcats.length; i++) {var value = {subcats:[this.subcats[i]]};emit(key, value);
}}
}> reduce = function (key, values) {
var result = {subcats:[]};
for (var i = 0; i< values.length; i++) {result.subcats = values[i].subcats.concat(result.subcats);
}result.subcats = result.subcats.sort();return result;
}> db.categories.mapReduce(map, reduce,{out:{inline : true}});{
"result" : "tmp.mr.mapreduce_1310454989_43","timeMillis" : 2,"counts" : {
"input" : 3,"emit" : 4,"output" : 3
},"ok" : 1,
}
> db.tmp.mr.mapreduce_1310454989_43.find(){ "_id" : { "id" : "1", "name" : "Electronics" }, "value" : { "subcats" : [ 2, 3 ] } }{ "_id" : { "id" : "2", "name" : "Cellular" }, "value" : { "subcats" : [ 3 ] } }{ "_id" : { "id" : "3", "name" : "Nokia" }, "value" : { "subcats" : [ "none" ] } }
Page 25 of 32
CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM
Search total product value
As mentioned, aggregating multiple values is another use for mapreduce[17] and in this use case we
need to query the system and find the sum total of the cost_price of all the products that are in
stock. The following shows how this can be achieved:
> map = function () {emit("sub_total", this.items_in_stock * this.cost_price);
}> reduce = function (key, values) {
var grand_total = 0;
for (var i = 0; i < values.length; i++) {grand_total += values[i];
}return grand_total;
}> db.products.mapReduce(map, reduce, {out:{inline : true}});{
"result" : "tmp.mr.mapreduce_1310392963_13","timeMillis" : 3,"counts" : {
"input" : 2,"emit" : 2,"output" : 1
},"ok" : 1,
}> db.tmp.mr.mapreduce_1310392963_13.find(){ "_id" : "sub_total", "value" : 4200 }
Search cart total per date
As with relational databases it is sometimes feasible to store aggregated data. To satisfy this query
we will add an extra field to the document called 'total' and populate it with the cart total during the
data move step described in the 'Create a shopping cart' section we will then use a map-reduce
query to group the totals together by date.
> map = function () {
Page 26 of 32
CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM
var millis = Date.parse(this.sales_date.substr(0,10)); var sales_dt = new Date(millis);
var key = sales_dt.getFullYear().toString() + '- ' + sales_dt.getMonth().toString();
var value = this.total;emit(key, value);
}
> reduce = function (key, values) {var result = 0;for (var i = 0; i< values.length; i++) {
result += values[i]}return result;
}
> db.cart.mapReduce(map, reduce,{out:{inline : true}});{ "result" : "tmp.mr.mapreduce_1310471886_57", "timeMillis" : 4, "counts" : {
"input" : 2,"emit" : 2,"output" : 1
}, "ok" : 1,}> db.tmp.mr.mapreduce_1310471886_57.find(){ "_id" : "2011-6", "value" : 1080 }
Discard Shopping Cart
We do not have to make and changes to accommodate this query as we will simply find the
document by one of the indexed fields and then remove the document from the collection. This will
not happen often so we do not have to worry about the overhead of maintaining indexes.
Page 27 of 32
CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM
Infrastructure
Deployment
Based on the MongoDb documentation[11] we will start with a setup as shown in the following
diagram. This setup ensures that queries are distributed across multiple shards which improves
performance, it ensures that there are three replicas of the data available (each of the servers in
the replica set[12]) and it allows for disaster recovery scenarios by replicating to servers in another
data centre.
Page 28 of 32
CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM
Mongo Processes
The bulk of MongoDb processing is handled by the processed depicted in the diagram. Mongod is
the main database process. It completes the actual querying and editing of the data contained in
the database.
Mongos on the other hand is a only a routing service. A client application will communicate with the
mongos process which in turn will query the configuration store (config mongod in the diagram) to
find out which shard(s) to communicate with. It will then route the query to the appropriate shard(s)
and merge the results from the different shards where applicable, before it returns the combined
result to the client application. This method ensures that the client application only needs to be
aware of one process to communicate with and does not have to have intimate knowledge of all
the mongod processes.
Note that the mongos processes can be run in many different configurations. It can be installed on
all of the servers or only on some. It can also be installed on separate servers with no mongod
processes installed. There may be a performance boost if the service is installed on each server as
it will be able to communicate over the localhost interface.
Replica Sets[12]
A replica set consists of two or more servers with the mongod process installed. One server in a
replica set will be 'nominated' as master and will service all read and write requests. If the master
fails or becomes unavailable the slave will automatically become the master and start serving
requests.
Operating System
MongoDb uses memory-mapped files to manage data which means that the database size is
limited to 2 GB on 32-bit operating systems. Use a 64-bit operating system to support databases
over 4 TB.
RAM
MongoDb uses memory-mapped files to manage data which allows it to map data in memory as it
appears on the hard disk. MongoDb will keep data in memory once it is queried for the first time (if
possible) and use the in memory data for subsequent queries which is more efficient than reading
from disk. Having a lot of memory available could speed up queries significantly as the whole
Page 29 of 32
CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM
database could potentially be loaded into memory.
Network
Setting up replication and backups will increase network traffic which could affect the query
performance. Adding an extra network card and creating a separate network on which the servers
can communicate with replication and backup servers could also reduce network 'noise'.
Next Steps
Page 30 of 32
CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM
References
1. Sharding:
http://www.mongodb.org/display/DOCS/Sharding+Introduction
2. Chunk:3. http://www.mongodb.org/display/DOCS/Sharding+Introduction#ShardingIntroduction-
Chunks
4. BSON Object:http://www.mongodb.org/display/DOCS/Object+IDs#ObjectIDs-BSONObjectIDSpecification
5. Choosing a Shard Key:http://www.mongodb.org/display/DOCS/Choosing+a+Shard+Key
6. Indexing:http://www.mongodb.org/display/DOCS/Indexes
7. MongoTips:http://mongotips.com/b/a-few-objectid-tricks/
8. Splitting Chunks:http://www.mongodb.org/display/DOCS/Splitting+Chunks
9. MongoDB Pre-Splitting for Faster Data Loading and Importing:http://blog.zawodny.com/2011/03/06/mongodb-pre-splitting-for-faster-data-loading-and-
importing/
10. Indexing as a Background Operation:http://www.mongodb.org/display/DOCS/Indexing+as+a+Background+Operation
11. Explain: http://www.mongodb.org/display/DOCS/Explain
12. Simple Initial Sharding Architecture:http://www.mongodb.org/display/DOCS/Simple+Initial+Sharding+Architecture
13. Replica Sets: http://www.mongodb.org/display/DOCS/Replica+Sets
14. Multikeys: http://www.mongodb.org/display/DOCS/Multikeys
15. Update:http://www.mongodb.org/display/DOCS/Updating#Updating-update%28%29
16. Modifiers:http://www.mongodb.org/display/DOCS/Updating#Updating-ModifierOperations
17. Atomic Operations:
Page 31 of 32
CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM
http://www.mongodb.org/display/DOCS/Atomic+Operations
18. Map Reduce Basics:http://kylebanker.com/blog/2009/12/mongodb-map-reduce-basics/
19. Finding Max And Min Values for a given Key:http://cookbook.mongodb.org/patterns/finding_max_and_min_values_for_a_key/
20. Calculate the hex value of an IP address:http://www.pocketnes.org/hexa.html
21. GUID:http://en.wikipedia.org/wiki/Globally_unique_identifier
22. MongoDB Auto-sharding and Foursquare Downtime:http://nosql.mypopescu.com/post/1251523059/mongodb-auto-sharding-and-foursquare-
downtime
Page 32 of 32