case study: using mongodb for an e-commerce platform

CASE STUDY: USING MONGODB FOR AN E-COMMERCE PLATFORM

VERSION 1.0: JULY 19, 2011

AUTHOR: HENNIE GROBLER ([email protected])


Overview..........................................................................................................................................4

Scope..........................................................................................................................................4

Sources....................................................................................................................................... 5

System Definition.............................................................................................................................6

Use Cases...................................................................................................................................6

Constraints and assumptions:......................................................................................................7

Define the Schema...........................................................................................................................7

Identify System Operations..........................................................................................................8

Identify Entities and Fields...........................................................................................................9

MongoDb Best Practices and Considerations................................................................................10

Entity Relationships...................................................................................................................10

Size of Data...............................................................................................................................10

Indexing ....................................................................................................................................10

Adding indexes......................................................................................................................11

Filter Criteria.....................................................................................................................11

Sorting..............................................................................................................................11

Considerations......................................................................................................................11

Query Optimization................................................................................................................12

Sharding....................................................................................................................................12

Automatic Sharding...............................................................................................................12

Sharding Key.........................................................................................................................12

Considerations......................................................................................................................13

Using the _id (or date based data) as the shard key.........................................................13

Read / Write Ratio............................................................................................................13

Related Data.....................................................................................................................14

Unique Keys.....................................................................................................................14

Result Order.....................................................................................................................14

Bringing it all together.....................................................................................................................15

Entities.......................................................................................................................................15

Product..................................................................................................................................15

Category...............................................................................................................................16

User...................................................................................................................................... 16

Shopping Cart.......................................................................................................................17

Actions.......................................................................................................................................18

Search for product based on SKU.........................................................................................18

of 32


Search for products by product name...................................................................................18

Search for products by category identifier.............................................................................18

Increment / decrement stock item.........................................................................................19

Add / Edit products................................................................................................................20

Create Shopping Cart............................................................................................................21

Problem............................................................................................................................21

Define the correct shard key.............................................................................................21

Split read and write data...................................................................................................22

Add / Remove products to / from shopping cart.....................................................................22

Pay for cart by credit card.....................................................................................................22

Search for all categories........................................................................................................23

Search for products less than reorder threshold....................................................................23

Search for sub-categories by category identifier....................................................................24

Search total product value.....................................................................................................26

Search cart total per date......................................................................................................26

Discard Shopping Cart..........................................................................................................27

Infrastructure..................................................................................................................................28

Deployment................................................................................................................................28

Mongo Processes.................................................................................................................29

Replica Sets[12]....................................................................................................................29

Operating System......................................................................................................................29

RAM........................................................................................................................................... 29

Network.....................................................................................................................................30

Next Steps......................................................................................................................................30

References.....................................................................................................................................31

of 32


Overview

MongoDb garnered much attention over the last couple of years. It is said to be fast and reliable

and that it automates some of the processes that are usually very time consuming and error prone.

Adoption seems to be growing steadily as it is being used in more and more, high transaction

volume systems like Foursquare, Bit.ly and Sourceforge.

MongoDb seemed like the 'way to go' but then some reports of down time surfaced as was the

case with Foursquare (MongoDB Auto-sharding and Foursquare Downtime[21]) and I realised that it

is not a 'quick fix' solution that can be applied to all scenarios.

Financial systems seemed to be the most unsuitable type of application to use with a MongoDb

back-end. I am still not 100% convinced that MongoDb can be used with all types of financial

systems, especially not banking systems, but I believe that it may be suitable for most e-commerce

systems.

I found the following factors to be most obvious issues with starting a MongoDb implementation:

• Schema Design: The schema design used for MongoDb and MySql implementations are

vastly different but because developers are generally used to designing for relational

databases they are prone to make some bad design decisions.

• Sharding: MongoDb has many built-in features that reduce the operational procedures that

must be in place, but not understanding how these features work could cause some serious

system problems.

• Experience: MongoDb is a relatively new technology compared to its relational

counterparts like MySql which means that there is an equally limited amount of experienced

MongoDb developers and administrators in the field.

This document tries to solve the above mentioned issues somewhat, by providing an overall

overview of an imaginary e-commerce system built on MongoDb, instead of the numerous

disjointed examples found on the internet.

Scope

The document covers the creation of the data schema for the e-commerce system, and provides

an overview of the infrastructure and some of the operational procedures that must be in place to

get started with a MongoDb implementation. It does not however discuss the actual e-commerce

website implementation.

of 32


We will assume that the system has a limited amount of functionality as defined in subsequent

sections. This will provide a set of parameters for the use case and avoid an overly complex design

that could be confusing and therefore hide some of the learning's that can be taken away from it.

Sources

This document is based on theoretical knowledge of the topic but all statements, conclusions and

examples therein is based on information found on the MongoDb site, other use cases and various

blogs that are freely available on the internet.

All sources are noted at the end of the document. It is recommended that these additional

resources also be assimilated in order to get the maximum benefit from this document.

of 32


System Definition

Based on what we have been taught about relational database design there is only one correct

design for a given problem. The approach would normally be to analyse the data, identify all the

prominent entities that are represented by the data, create a table for each and then create the

appropriate relationships between the tables.

Once all of the data normalization (sometimes de-normalization) rules have been applied the

design was done. With MongoDb databases this process differs slightly as the data schema cannot

be designed without first evaluating what the system will do with the data.

Use Cases

The system will be limited to the following use cases:

• A user can

1. register on the site

2. log in on the site with username (email) and password

3. view products from a specific category

4. search the product list based on the name of the product

5. view a specific product

6. add n number of different products to a shopping cart

7. remove products from a shopping cart

8. can discard a shopping cart

9. can pay for a shopping cart by credit card

• The system must

10. track product stock levels

• An accountant can view the following reports:

11. Total daily, monthly and yearly income earned from online sales ordered by date

12. Total value of stock on hand

• An inventory clerk can:

13.Add / Edit Products

14.Set inventory stock level order threshold per product (When an order must be placed

otherwise shop will run out of stock)

of 32


Constraints and assumptions:

• Email addresses are unique

• Product Identifiers are unique

• A user must be logged in to be able to make a payment

• Shopping Cart

◦ Limited to 500 line items.

◦ Each line item will be a unique product. It a product is added to the cart that already

exists, then the original order item quantity will increase with the amount of the new line

item

• A category can be a sub-category of another category

• Passwords saved to the database must be made up of a cryptographic hash of the

password with an added salt value (random value)

• The stock levels of a product is only adjusted when new stock is added to the inventory and

only removed once an item is added to a cart and that cart is successfully paid

• The system supports user roles (User, Accountant, Inventory Clerk) where each role has

access to different functionality

Define the Schema

This case study will use the following steps to identify the final data schema:

1. Identify the operations that the system need to support, based on the system functionality

2. Identify the entities that the operations 'interact' with

3. Identify meta-data of the entities

4. View how the entities are used in the system in relation to one another

5. Bring it all together by using the findings from the first four steps and applying some best

practice rules to them

of 32


Identify System Operations

The following actions were identified based on the previously defined functionality and is ordered

into probability of a possible usage scenario. Order is for demonstration only and may vary

depending on actual implementation.

The table also shows which system function the action relates to, the type of operation and which

potential entities and fields were identified.

Action SystemFunction

Type Subject(s) / Fields

1 Search for product based on SKU 5 Read Product (SKU)2 Search for all categories 3 Read Category3 Search for products by category identifier 3 Read Product / Category

(id)4 Search for sub-categories by category

identifier

3 Read Category (id,

parent_id)5 Search for products by product name 4 Read Product (name)6 Create shopping cart 6 Write Cart7 Add / remove products to shopping cart 6,7 Write Cart (line items)8 Pay for cart by credit card 9 Write Cart, Payment (credit

card info)9 Increment / decrement stock item 11 Write Product

(items_in_stock)10 Find user by email (not by password as well

as salt must be returned to calculate correct

password)

2 Read User (email,

password, salt)

11 Save new / existing user (similar to Add /

remove products from shopping cart so will

be discarded)

1 Read User

12 Add / Edit products 14 Write Product13 Search for products less than reorder

threshold

15 Read Product

(reorder_threshold)14 Search cart total per date (ordered) 12 Read Cart (date, total)15 Search total product value 13 Read Product (cost_price)16 Discard shopping cart 8 Delete Cart

of 32


Identify Entities and Fields

The previous section identified the different system entities and also identified some fields. We will

now expand on this by reviewing the constraints and assumptions. We will also add some

additional attributes that will probably be required by a real system to make this example more

complete.

Entity FieldsProduct name, SKU, cost_price, selling_price, items_in_stock, reorder_thresholdCategory id, parent_id, list of productsCart date, totalCart Line Item product info, quantityPayment credit card infoUser firstname, lastname, email, password, salt, shipping address, role (user,

accountant, inventory clerk)

of 32


MongoDb Best Practices and Considerations

Entity Relationships

Each of the entities will most probably be modelled as individual tables in a relational database but

this may not necessarily be the case with a MongoDb database. One of the biggest factors in

deciding how the data is modelled depends on how the entities are accessed in relation to one

another.

For example, if an invoice and its line items are always accessed together then it would be better

for performance to model them as one entity. Alternatively if line items are regularly accessed

individually, then it would probably be better to model them as separate entities.

For example, based on the current use case we will model the Shopping Cart and Cart Line Items

as one document.

Size of Data

The maximum size of a document in MongoDb is currently limited to 8 MB but a maximum size of

32 MB has been proposed and this will probably increase even further in future. It may sound like

good idea to store very large objects in a document but consider that the whole document must

travel across the network between the database server and the application server when it is

accessed.

In cases where only part of the document is accessed each time it is retrieved it would be less

resource intensive if the document is split into smaller documents.

Indexing

Adding indexes to your collections could significantly increase the query performance as MongoDb

can quickly navigate the index to find the relevant document by key instead of scanning each

document in the collection.

The following shows a simplified depiction of how the system is able to navigate the index to find

the relevant information (in this case the user with the surname of Straub) without having to scan

each and every document in the collection.

of 32


MongoDB automatically creates an index on the _id column but additional indexes can be added

as required.

Adding indexes

Filter Criteria

The fields that indexes are applied to depend on the queries that are completed. In our use case

the system will 'Search for products based on SKU' so we can therefore define an index on the

SKU field of the document.

Sorting

Based on the 'Search cart total per date (ordered)' system action we would also need to add an

index on the date as the query is sorted by date. Adding an index on the field that is sorted on

enables MongoDb to sort the data without having to open each document.

Considerations

The following must be taken into consideration when applying indexes:

• Additional Overhead: Values are added / removed from an index whenever documents

are added/removed to/from the collection. This does not pose a problem in systems that do

mostly read operations but in write heavy systems this may incur significant overhead as

the index must be continuously updated.

• Initial Index Blocking: No queries can be done against the database when the index is

first applied except when using {background:true} option[9].

of 32

King

Harris Rice

StraubKoontzBachman Graham


• Case Sensitive: MongoDb indexes are case sensitive

• Indexes per Collection: There is a limit of 40 indexes per collection. In most cases this

number is more than sufficient.

• Index Key Size: Currently a maximum key length that can be indexed, is 800 bytes.

Query Optimization

As with applying indexes on a relational database, you sometimes get unexpected results so it is

good practice to verify that the query uses the intended index and that using the index actually

results in better performance. This can be done by examining the query execution plan by issuing

the explain()[10] command.

Sharding

Automatic Sharding

MongoDb supports automatic sharding[1] where data is automatically spread out across multiple

servers in order to distribute the transaction load. The system accomplishes this by storing data in

multiple files (called chunks[2]) across multiple servers. Each chunk can be up to a maximum of 200

MB in size by default but can be overridden to be larger.

Once a chunk reaches approximately 50%-75% (100 MB to 150 MB) of the maximum size,

MongoDb will create a snapshot of the chunk and copy the snapshot data to the new chunk. Writes

can still be done to the original chunk while this copy operation is in process. Once the copy

process is completed, the changes made to the original chunk will be applied to the new chunk

before it is made available.

Sharding Key

Mongo Db uses a key called a shard key to decide to which chunk, data will be allocated. The

shard key will by default be based on the _id column that is made up of a BSON object (see BSON

ObjectId Specification[3]) but this can be overridden by user code to consist of any user defined

value.

A shard key for user document could for example be based on the user last name. With that in

mind imagine that we have three chunks with user data. The first chunk may contain all the users

that have a surname starting with B to H, the second Ki to Ko and the third chunk R to S.

of 32


If a user with a last name of Barker is added, it will be written to the first chunk where a user with a

last name of Smith will be written to the last chunk.

Considerations

Deciding on the correct shard key may be one of the most significant design decisions that are

made during the design process as it could have a major impact, positive or negative, on system

performance. The following are some considerations to note.

Using the _id (or date based data) as the shard key

MongoDb automatically adds an _id attribute to each document (if not overridden by application

code) and populates it with a unique value (see BSON ObjectId Specification [3]). The BSON object

consist of a couple of values that are concatenated together to form a (relatively) unique value. The

first part of this unique value is calculated based on the current date and time.

This could be an advantage as data is automatically stored in date order which would increase

performance of queries that query data by date range or need to order results by date. This fact

can also be exploited in other ways. For example most drivers support extracting the creation date

and time from the _id which means that storing a 'created at' value in the document is not required.

On the other hand, based on the MongoDb website it could also have some implications on

scalability. At the beginning of each month documents will be written to the same server until the

data chunks are migrated across to other servers. This issue can mitigated by adding some

uniqueness to the key and pre-splitting chunks[7].

Read / Write Ratio

The read / write ratio that the system will experience must also be carefully considered. If the

system experiences many reads it would be better for performance if the whole query can be

satisfied from one shard and preferably one document. Alternatively if the system experiences

many writes it would be better if the shard keys are defined in such a way that the writes are

distributed between multiple servers in order to spread the workload. This can be achieved by

adding more uniqueness to the shard key.

of 32

King, Stephen………

Koontz, Dean

Rice, Anne………

Straub, Peter

Bachman, Richard………

Harris, Thomas


If the system experience exceptionally many writes then the way that the MongoDb balancer

handles the splitting of chunks could also become an issue as described in the 'MongoDB Pre-

Splitting for Faster Data Loading and Importing'[8] article.

Related Data

Keeping related data close together will improve system performance as all the data can be

retrieved from one chunk or shard. In a system with lots of user related content we may prefix the

shard key with the user id. We could 'force' the system to store different documents containing user

related information like personal data, uploaded media and purchase history close together by

prefixing each document _id with the particular user id.

Unique Keys

Shard keys should normally be as unique as possible. MongoDb can only shard data if the key can

be split into smaller parts. Depending on the system, there may some performance issues that

start appearing once chunks start to grow past the default 200 MB maximum size.

For example using State (eg. Texas and Ohio) as the shard key for user related data may cause

some problems in the future as MongoDb will have to write data for ALL users that live in a

particular state to the same chunk and because it cannot split the chunk it would grow to be very

large.

If the key is changed to include City it would allow MongoDb to create a chunk for each State+City

combination which allows for a lot more granularity.

If it is also considered that each State+City chunk is potentially stored on a different server and that

some cities have more users than others, it becomes clear that some servers will experience

higher loads than others.

Result Order

The order in which search results are returned to the client can also affect the selection of an

appropriate shard key. Continuing with the State / City example let us imagine that we defined a

shard key of {state:1,city:1} on our data and that the relevant data returned by a query is

stored on multiple servers.

• If the query returns data ordered by city, each server will need to compile the search results

and then sort the data. The data is then returned from each server and then the results are

merged into one by the mongos process (See Deployment section). The extra sorting step

has to be completed as there is not an index defined on the city column alone but on the

of 32


combination of State+City.

• If the query on the other hand sorts by state or state+city then each server will compile the

data and stream it back in order to the mongos process without having to sort and merge

the results as it will be able to utilise the defined index.

Bringing it all together

After reviewing the system functionality as well as some of the best practices and considerations

we are able to create our document schemas and define the queries that will be run against the

system.

Entities

Based on the 'Identify Entities and Fields' section we can assume that the documents would

resemble the following samples. The structure and content of these documents may change further

as the different actions are considered in the following section.

Product

Each product document will have the following structure and will be allocated to the products

collection. Categories will also be stored in the product document but will be discussed in detail in

a subsequent section.

Collection: products{ "_id": ObjectId("4e1b091559a4f01109000000"), "name": "Ipad", "sku": "10001-23424-9098", "cost_price": 300, "selling_price": 320, "items_in_stock": 9, "reorder_threshold": 10}{ "_id": ObjectId("4e1b08e159a4f01608000000"), "name": "Ipod Nano", "sku": "10001-23424-9098", "cost_price": 100, "selling_price": 120, "items_in_stock": 10,

of 32


"reorder_threshold": 15}

Category

Category documents will be allocated to the categories collection and will not be sharded as all the

category documents will make up a relatively small amount of data. We will also override the

default generated _id as it is very long. The reason for this will be explained later on. Categories

will fortunately not be updated often which means that the performance hit of using a custom

incremental _id for categories, is acceptable

Collection: categories{ "_id": "1", "name": "Electronics", "subcats": [2, 3]}{ "_id": "2", "name": "Cellular", "parents": [1], "subcats": [3] }}{ "_id": "3", "name": "Nokia", "parents": [1, 2 ] }}

User

User documents will be allocated to their own collection called users

Collection: users{ "_id": ObjectId("4e1bfba789a4f02207000000"), "firstname" : "John", "lastname" : "Doe", "email" : "[email protected]", "password" : "[encrypted_text]", "password_salt" : "[salt_text]" "shipping_address" : {

of 32

mailto:[email protected]


"address1" : "33 Rainbow Road","city" : "Cape Town","postal_code" : "8000"},

"role" : "user"}

Shopping Cart

The shopping cart, products in the cart and the payment made for the cart will always be queried

together which means that the data can be stored as one document. Each of the line items will

become an array item in the document. Some of the product data was duplicated into the cart

object which prevents additional database lookups when completing actions like previewing the

cart or generating an invoice or even reprinting an invoice a year after it was paid for.

The payment details and some of the user details will also be stored in the document.

Collection: cart

{ "_id": ObjectId("4e1bfba559a4f02207000000"), "line_items": [{ "_id": "1_4e1b091559a4f01109000000", "cost_price": 300, "name": "Ipad", "selling_price": 320, "sku": "10001-23424-9098", "qty": 2 }, { "_id": ObjectId("4e1b08e159a4f01608000000"),

"cost_price": 100, "name": "Ipod Nano", "selling_price": 120, "sku": "10001-23424-9098",

"qty": "1", }], "payment": { "card_number": "[encrypted_text]", "expiry": "11\/12", "card_holder": "Mr J Doe" },

of 32


"sales_date": "2011-07-12 09:45:36", "total": 760, "user": {

"id": ObjectId("4e1bfba789a4f02207000000"), "name" : "John Doe",

"email" : "[email protected]", "shipping_address" : {

"address1" : "33 Rainbow Road","city" : "Cape Town","postal_code" : "8000"

},"role" : "user"

}}

Actions

The actions are not ordered as defined in the 'Identify System Operations' section as some of the

discussions build one previous ones.

Note: All of the following examples refer to the document examples defined in the 'Entities' section unless otherwise specified.

Search for product based on SKU

Add an index on the SKU field of the product document

Search for products by product name

Add an index on the name field of the product document

Search for products by category identifier

Based on one of the best practices it is better to combine all the information related to a specific

entity into one document so that the system can satisfy the query without having to retrieve

multiple documents.

That would suggest that we save all of the products into the specific category document. In a

normal e-commerce system we will have hundreds or thousands of products over time which will

result in very large documents.

of 32

mailto:[email protected]


We could opt to model the product and category entities as separate documents which means that

these documents should somehow reference each other.

In our design we will add the category _id to the product document like this:

{ "_id": ObjectId("4e1b091559a4f01109000000"), "name": "Ipad", .... "category" : “10”}

We could then add an index on the category column in order to quickly find all products in a

particular category.

We could alternatively embed the whole category document in the category field if required. This

approach would take more disk space because of the duplicated data but if the category data

needs to be displayed on the front end with category information it could prevent an extra query to

the database. This may only be an option if the category information is relatively static.

In cases where a product can belong to a multiple categories we could use an array of category

id's.

{ "_id": ObjectId("223b091559a4f01109000000"), "name": "Nokia", .... "categories": ["1": "2"] }}Querying for a specific value in an array field is supported by MongoDb with the Multikey feature[13].

Increment / decrement stock item

The items_in_stock field will in essence be a counter that is incremented or decremented when an

item is added to stock or sold. In this case the system does not need to return the document to the

client. The system is able to increment / decrement the document in place.

Updating a document[14] will normally take this form:

var product = prodCollection.findOne({_id: “4e1b091559a4f01109000000”});product.items_in_stock++;prodCollection.save(product);

of 32


But we can use a modifier[15] which is much more efficient and can be used for atomic updates [16]

on the document. We will most probably query for a product by _id which automatically has an

index defined on it.

Use the following to increment the items_in_stock without retrieving the whole document (note the

$inc operator):

db.products.update ( { _id : ObjectId( "497ce4051ca9ca6d3efca323" ) }, { $inc: { items_in_stock : 1}});

or the following to decrement the stock level:

db.products.update ( { _id : ObjectId( "497ce4051ca9ca6d3efca323" ) }, { $inc: { items_in_stock : -1}});

Add / Edit products

The most important aspect when editing data is deciding on the shard key as this will influence

which shard the data will be written to and how the data will be located during queries.

Adding and editing products will not happen that often in comparison to other types of transactions

which means that the default _id should be sufficient to be used as the shard key. But considering

that searching for products by category is a high volume transaction we could concatenate the

category _id to the product id so that all products in a category are grouped together as shown in

the following example:

{ "_id": "1", "name": "Electronics"}

{ "_id": ObjectId("1_4e1b091559a4f01109000000"), "name": "Ipad", .... "category" : "1"}

Another side effect of pre-pending the category for systems where a product can only belong to

one category, is that we potentially do not have to store the category as a separate field as it can

of 32


be extrapolated from the product _id.

Create Shopping Cart

Problem

As with the product entity, careful consideration is required when deciding what the shopping cart

shard key will consist of. In a high transaction volume environment there will be a tremendous

amount of writes completed as new shopping carts are created and items are added and removed.

Then once the cart is paid, it will be mostly read from for reporting purposes etc.

This makes it difficult as applying indexes for example, will allow for fast retrieval of the data after

payment but will hurt performance while the purchase is in progress. Also choosing a shard key

related to date will allow for better querying of the data but will be dangerous as it could mean that

all writes will be done to the same shard instead of being spread out over many shards.

Completing regular data intensive queries for reports etc. could also hurt system performance and

potentially affect the user experience.

Define the correct shard key

In our case we will avoid using the default generated _id as it will cause excessive writes to one

server at some times during the month. A similar issue was described in the last paragraph of the

'Using the _id or date based data as the shard key' section.

There are many different ways to generate a unique number that can be used for the _id of your

document. Most approaches combine a couple of values to get a unique value. In some systems it

may be sufficient to concatenate the user id and the date. We could even be more inventive and

use the application server name that the transaction was generated on or even use the

hexadecimal representation of the user's IP address[19] (eg. IP 196.134.96.111 = hex C4 86 60 6F)

to help make values unique.

In our use case we will keep it simple and use a GUID [20] for a unique key. We could also pre-split

chunk[7] data if necessary. This will ensure that write operations to the cart is distributed across

many shards.

Adding and removing line items to/from the cart can be done most efficiently by using the $push

and $pull[14] modifiers that will add items to the document in place. Because an index is

automatically added to the _id field finding the documents by _id will also be fast.

And lastly, once a cart is paid we can use the $set[14] modifier to add payment details.

of 32


Split read and write data

Data volumes in this collection will eventually grow very large and may affect performance. This is

especially true if it is considered that reporting queries and other search queries will be done

against the same database. We will therefore use two collections, one for 'active' carts and another

for 'completed' carts. The active cart collection will have no additional indexes in order to cater for

the frequent updates whereas the completed cart collection will have more indexes to cater for the

different search queries.

Moving data between collections will cause extra overhead on the system so we will split this

processing into different parts. We will assume that real time (or as close to as possible) reporting

is required which means that we cannot use a deferred job that will move the data during a low

transaction volume period like midnight to 3 AM.

There are various approaches but we will go with a more complex option in order to demonstrate

some MongoDb less well knows features.

When a cart is paid and payment details are saved we will add an additional field called

'processed' using the $set[14] modifier. This field will have a sparse[5] index defined on it. Sparse

indexes only include documents that contain the field that the index is defined on.

A separate server process will query the database at intervals and retrieve all the documents that

have a 'processed' field in the document. Because of the sparse index it will be a very efficient

query and will not affect write queries as only documents containing the 'processed' field will be

included in the index. These documents will be retrieved and saved into the second collection and

once the document is moved the 'processed' field will be removed from the original. Because the

field is removed that document will not be returned on subsequent 'data move' queries. Care needs

to be taken to ensure that this both updates happen in an atomic[16] fashion.

A third process will be run during low transaction volume period. This job will remove all documents

from the first collection that exist the second.

Add / Remove products to / from shopping cart

See 'Create shopping cart' section.

Pay for cart by credit card

See 'Create shopping cart' section.

of 32


Search for all categories

Due to the static nature of the category data it would most probably be cached on the client

application server instead of being queried for continuously. This means that nothing additional

would be required for this query except possibly an index on the category name if the result of the

'Search all categories' query must be sorted by name.

Search for products less than reorder threshold

Finding documents where the value of a field is less than another value can be completed with the

first query below but MongoDb does not support using the $lt modifier with a column name yet, as

shown in the second query.

> db.products.find({items_in_stock: {$lt:20}}){ "_id" : ObjectId("4e1b091559a4f01109000000"), "items_in_stock" : 9, "name" : "Ipad", "reorder_threshold" : 10 }> db.products.find({items_in_stock: {$lt:reorder_threshold}})Mon Jul 11 14:43:28 ReferenceError: reorder_threshold is not defined (shell):0

We are able to make use of a mapreduce[17] function though.

In this use case the query will access all the product documents in the collection, it does not have

any filter criteria and does not require sorting which makes it a good option for map-reduce The

following example is adapted from the 'Finding Max And Min Values for a given Key' article[18].

Based on the example data (Entities section) the result is expected to look like this:

{ _id : "1_497ce4051ca9ca6d3efca323", value : { product : { name : “Ipod Nano” , items_below_level : 5 } } }{ _id : "1_678ce4051ca9ca6d3efca323", value : { product : { name : “Ipad” , items_below_level : 1 } } }

Explaining map / reduce is out of scope of this document but suffice it to say that the functions are

applied to each document. Our map function would check whether the items in stock for a

particular product, are below the set threshold, and if it is, it will emit the value. The reduce function

will normally be used to aggregate values (eg. sums, counts and averages) but in our case not, so

the function just returns the result.

> map = function () { if (this.items_in_stock < this.reorder_threshold) {

of 32


Search total product value

As mentioned, aggregating multiple values is another use for mapreduce[17] and in this use case we

need to query the system and find the sum total of the cost_price of all the products that are in

stock. The following shows how this can be achieved:

> map = function () {emit("sub_total", this.items_in_stock * this.cost_price);

}> reduce = function (key, values) {

var grand_total = 0;

for (var i = 0; i < values.length; i++) {grand_total += values[i];

}return grand_total;

}> db.products.mapReduce(map, reduce, {out:{inline : true}});{

"result" : "tmp.mr.mapreduce_1310392963_13","timeMillis" : 3,"counts" : {


},"ok" : 1,

}> db.tmp.mr.mapreduce_1310392963_13.find(){ "_id" : "sub_total", "value" : 4200 }

Search cart total per date

As with relational databases it is sometimes feasible to store aggregated data. To satisfy this query

we will add an extra field to the document called 'total' and populate it with the cart total during the

data move step described in the 'Create a shopping cart' section we will then use a map-reduce

query to group the totals together by date.

> map = function () {

of 32


Infrastructure

Deployment

Based on the MongoDb documentation[11] we will start with a setup as shown in the following

diagram. This setup ensures that queries are distributed across multiple shards which improves

performance, it ensures that there are three replicas of the data available (each of the servers in

the replica set[12]) and it allows for disaster recovery scenarios by replicating to servers in another

data centre.

of 32


Mongo Processes

The bulk of MongoDb processing is handled by the processed depicted in the diagram. Mongod is

the main database process. It completes the actual querying and editing of the data contained in

the database.

Mongos on the other hand is a only a routing service. A client application will communicate with the

mongos process which in turn will query the configuration store (config mongod in the diagram) to

find out which shard(s) to communicate with. It will then route the query to the appropriate shard(s)

and merge the results from the different shards where applicable, before it returns the combined

result to the client application. This method ensures that the client application only needs to be

aware of one process to communicate with and does not have to have intimate knowledge of all

the mongod processes.

Note that the mongos processes can be run in many different configurations. It can be installed on

all of the servers or only on some. It can also be installed on separate servers with no mongod

processes installed. There may be a performance boost if the service is installed on each server as

it will be able to communicate over the localhost interface.

Replica Sets[12]

A replica set consists of two or more servers with the mongod process installed. One server in a

replica set will be 'nominated' as master and will service all read and write requests. If the master

fails or becomes unavailable the slave will automatically become the master and start serving

requests.

Operating System

MongoDb uses memory-mapped files to manage data which means that the database size is

limited to 2 GB on 32-bit operating systems. Use a 64-bit operating system to support databases

over 4 TB.

RAM

MongoDb uses memory-mapped files to manage data which allows it to map data in memory as it

appears on the hard disk. MongoDb will keep data in memory once it is queried for the first time (if

possible) and use the in memory data for subsequent queries which is more efficient than reading

from disk. Having a lot of memory available could speed up queries significantly as the whole

of 32


database could potentially be loaded into memory.

Network

Setting up replication and backups will increase network traffic which could affect the query

performance. Adding an extra network card and creating a separate network on which the servers

can communicate with replication and backup servers could also reduce network 'noise'.

Next Steps

of 32


References

1. Sharding:

http://www.mongodb.org/display/DOCS/Sharding+Introduction

2. Chunk:3. http://www.mongodb.org/display/DOCS/Sharding+Introduction#ShardingIntroduction-

Chunks

4. BSON Object:http://www.mongodb.org/display/DOCS/Object+IDs#ObjectIDs-BSONObjectIDSpecification

5. Choosing a Shard Key:http://www.mongodb.org/display/DOCS/Choosing+a+Shard+Key

6. Indexing:http://www.mongodb.org/display/DOCS/Indexes

7. MongoTips:http://mongotips.com/b/a-few-objectid-tricks/

8. Splitting Chunks:http://www.mongodb.org/display/DOCS/Splitting+Chunks

9. MongoDB Pre-Splitting for Faster Data Loading and Importing:http://blog.zawodny.com/2011/03/06/mongodb-pre-splitting-for-faster-data-loading-and-

importing/

10. Indexing as a Background Operation:http://www.mongodb.org/display/DOCS/Indexing+as+a+Background+Operation

11. Explain: http://www.mongodb.org/display/DOCS/Explain

12. Simple Initial Sharding Architecture:http://www.mongodb.org/display/DOCS/Simple+Initial+Sharding+Architecture

13. Replica Sets: http://www.mongodb.org/display/DOCS/Replica+Sets

14. Multikeys: http://www.mongodb.org/display/DOCS/Multikeys

15. Update:http://www.mongodb.org/display/DOCS/Updating#Updating-update%28%29

16. Modifiers:http://www.mongodb.org/display/DOCS/Updating#Updating-ModifierOperations

17. Atomic Operations:

of 32

http://www.mongodb.org/display/DOCS/Updating#Updating-ModifierOperations

http://www.mongodb.org/display/DOCS/Updating#Updating-update()

http://www.mongodb.org/display/DOCS/Multikeys

http://www.mongodb.org/display/DOCS/Replica+Sets

http://www.mongodb.org/display/DOCS/Simple+Initial+Sharding+Architecture

http://www.mongodb.org/display/DOCS/Explain

http://www.mongodb.org/display/DOCS/Indexing+as+a+Background+Operation

http://blog.zawodny.com/2011/03/06/mongodb-pre-splitting-for-faster-data-loading-and-importing/

http://blog.zawodny.com/2011/03/06/mongodb-pre-splitting-for-faster-data-loading-and-importing/

http://www.mongodb.org/display/DOCS/Splitting+Chunks

http://mongotips.com/b/a-few-objectid-tricks/

http://www.mongodb.org/display/DOCS/Indexes

http://www.mongodb.org/display/DOCS/Choosing+a+Shard+Key

http://www.mongodb.org/display/DOCS/Object+IDs#ObjectIDs-BSONObjectIDSpecification

http://www.mongodb.org/display/DOCS/Sharding+Introduction#ShardingIntroduction-Chunks

http://www.mongodb.org/display/DOCS/Sharding+Introduction#ShardingIntroduction-Chunks

http://www.mongodb.org/display/DOCS/Sharding+Introduction


http://www.mongodb.org/display/DOCS/Atomic+Operations

18. Map Reduce Basics:http://kylebanker.com/blog/2009/12/mongodb-map-reduce-basics/

19. Finding Max And Min Values for a given Key:http://cookbook.mongodb.org/patterns/finding_max_and_min_values_for_a_key/

20. Calculate the hex value of an IP address:http://www.pocketnes.org/hexa.html

21. GUID:http://en.wikipedia.org/wiki/Globally_unique_identifier

22. MongoDB Auto-sharding and Foursquare Downtime:http://nosql.mypopescu.com/post/1251523059/mongodb-auto-sharding-and-foursquare-

downtime

of 32

http://nosql.mypopescu.com/post/1251523059/mongodb-auto-sharding-and-foursquare-downtime

http://nosql.mypopescu.com/post/1251523059/mongodb-auto-sharding-and-foursquare-downtime

http://en.wikipedia.org/wiki/Globally_unique_identifier

http://www.pocketnes.org/hexa.html

http://cookbook.mongodb.org/patterns/finding_max_and_min_values_for_a_key/

http://kylebanker.com/blog/2009/12/mongodb-map-reduce-basics/

http://www.mongodb.org/display/DOCS/Atomic+Operations

case study: using mongodb for an e-commerce platform

Documents