mongodb berlin schema design

61
Schema Design Basic schema modeling in MongoDB Alvin Richards Technical Director, EMEA [email protected] @jonnyeight

Upload: alvin-john-richards

Post on 21-Apr-2015

394 views

Category:

Documents


5 download

DESCRIPTION

Thinking about schema design with MongoDB? In this talk we will cover the basics and discuss common patterns such as queues, trees, inventory etc.

TRANSCRIPT

Page 1: MongoDB Berlin Schema Design

Schema DesignBasic schema modeling in MongoDB

Alvin Richards

Technical Director, [email protected]

@jonnyeight

Page 2: MongoDB Berlin Schema Design

Topics

Schema design is easy!• Data as Objects in code

Common patterns• Single table inheritance• One-to-Many & Many-to-Many• Buckets• Trees• Queues• Inventory

Page 3: MongoDB Berlin Schema Design

So today’s example will use...

Page 4: MongoDB Berlin Schema Design

Terminology

RDBMS MongoDB

Table Collection

Row(s) JSON  Document

Index Index

Join Embedding  &  Linking

Partition Shard

Partition  Key Shard  Key

Page 5: MongoDB Berlin Schema Design

Schema DesignRelational Database

Page 6: MongoDB Berlin Schema Design

Schema DesignMongoDB

Page 7: MongoDB Berlin Schema Design

Schema DesignMongoDB

embedding

Page 8: MongoDB Berlin Schema Design

Schema DesignMongoDB

embedding

linking

Page 9: MongoDB Berlin Schema Design

Design Session

Design documents that simply map to your application>  post  =  {author:  "Hergé",                    date:  ISODate("2011-­‐09-­‐18T09:56:06.298Z"),                    text:  "Destination  Moon",                    tags:  ["comic",  "adventure"]}

>  db.posts.save(post)

Page 10: MongoDB Berlin Schema Design

>  db.posts.find()

   {  _id:  ObjectId("4c4ba5c0672c685e5e8aabf3"),        author:  "Hergé",          date:  ISODate("2011-­‐09-­‐18T09:56:06.298Z"),          text:  "Destination  Moon",          tags:  [  "comic",  "adventure"  ]    }     Notes:• ID must be unique, but can be anything you’d like• MongoDB will generate a default ID if one is not supplied

Find the document

Page 11: MongoDB Berlin Schema Design

Secondary index for “author”

 //  1  means  ascending,  -­‐1  means  descending

 >  db.posts.ensureIndex({author:  1})

 >  db.posts.find({author:  'Hergé'})          {  _id:  ObjectId("4c4ba5c0672c685e5e8aabf3"),          date:  ISODate("2011-­‐09-­‐18T09:56:06.298Z"),          author:  "Hergé",            ...  }

Add and index, find via Index

Page 12: MongoDB Berlin Schema Design

Examine the query plan>  db.blogs.find({author:  "Hergé"}).explain(){   "cursor"  :  "BtreeCursor  author_1",   "nscanned"  :  1,   "nscannedObjects"  :  1,   "n"  :  1,   "millis"  :  5,   "indexBounds"  :  {     "author"  :  [       [         "Hergé",         "Hergé"       ]     ]   }}

Page 13: MongoDB Berlin Schema Design

Examine the query plan>  db.blogs.find({author:  "Hergé"}).explain(){   "cursor"  :  "BtreeCursor  author_1",   "nscanned"  :  1,   "nscannedObjects"  :  1,   "n"  :  1,   "millis"  :  5,   "indexBounds"  :  {     "author"  :  [       [         "Hergé",         "Hergé"       ]     ]   }}

Page 14: MongoDB Berlin Schema Design

Query operators

Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne...

//  find  posts  with  any  tags>  db.posts.find({tags:  {$exists:  true}})

Page 15: MongoDB Berlin Schema Design

Query operators

Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne...

//  find  posts  with  any  tags>  db.posts.find({tags:  {$exists:  true}})

Regular expressions://  posts  where  author  starts  with  h>  db.posts.find({author:  /^h/i  })  

Page 16: MongoDB Berlin Schema Design

Query operators

Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne...

//  find  posts  with  any  tags>  db.posts.find({tags:  {$exists:  true}})

Regular expressions://  posts  where  author  starts  with  h>  db.posts.find({author:  /^h/i  })  

Counting: //  number  of  posts  written  by  Hergé>  db.posts.find({author:  "Hergé"}).count()

Page 17: MongoDB Berlin Schema Design

Extending the Schema

       new_comment  =  {author:  "Kyle",                                  date:  new  Date(),                                text:  "great  book"}

 >  db.posts.update(                      {text:  "Destination  Moon"  },                        {  "$push":  {comments:  new_comment},                          "$inc":    {comments_count:  1}})

Page 18: MongoDB Berlin Schema Design

 >  db.blogs.find({_id:  ObjectId("4c4ba5c0672c685e5e8aabf3")})

   {  _id  :  ObjectId("4c4ba5c0672c685e5e8aabf3"),          author  :  "Hergé",        date  :  ISODate("2011-­‐09-­‐18T09:56:06.298Z"),          text  :  "Destination  Moon",        tags  :  [  "comic",  "adventure"  ],                comments  :  [   {     author  :  "Kyle",     date  :  ISODate("2011-­‐09-­‐19T09:56:06.298Z"),     text  :  "great  book"   }        ],        comments_count:  1    }    

Extending the Schema

Page 19: MongoDB Berlin Schema Design

//  create  index  on  nested  documents:>  db.posts.ensureIndex({"comments.author":  1})

>  db.posts.find({"comments.author":"Kyle"})

Extending the Schema

Page 20: MongoDB Berlin Schema Design

//  create  index  on  nested  documents:>  db.posts.ensureIndex({"comments.author":  1})

>  db.posts.find({"comments.author":"Kyle"})

//  find  last  5  posts:>  db.posts.find().sort({date:-­‐1}).limit(5)

Extending the Schema

Page 21: MongoDB Berlin Schema Design

//  create  index  on  nested  documents:>  db.posts.ensureIndex({"comments.author":  1})

>  db.posts.find({"comments.author":"Kyle"})

//  find  last  5  posts:>  db.posts.find().sort({date:-­‐1}).limit(5)

//  most  commented  post:>  db.posts.find().sort({comments_count:-­‐1}).limit(1)

When sorting, check if you need an index

Extending the Schema

Page 22: MongoDB Berlin Schema Design

Use MongoDB with your language10gen Supported Drivers• Ruby, Python, Perl, PHP, Javascript• Java, C/C++, C#, Scala• Erlang, Haskell

Object Data Mappers• Morphia - Java• Mongoid, MongoMapper - Ruby• MongoEngine - Python

Community Drivers• F# , Smalltalk, Clojure, Go, Groovy

Page 23: MongoDB Berlin Schema Design

Using your schema- using Java Driver//  Get  a  connection  to  the  databaseDBCollection  coll  =  new  Mongo().getDB("blogs");

//  Create  the  ObjectMap<String,  Object>  obj  =  new  HashMap...obj.add("author",  "Hergé");  obj.add("text",  "Destination  Moon");obj.add("date",  new  Date());

//  Insert  the  object  into  MongoDBcoll.insert(new  BasicDBObject(obj));

Page 24: MongoDB Berlin Schema Design

Using your schema- using Morphia mapper//  Use  Morphia  annotations@Entityclass Blog { @Id String author; @Indexed Date date; String text;}

Page 25: MongoDB Berlin Schema Design

Using your schema- using Morphia//  Create  the  data  storeDatastore  ds  =  new  Morphia().createDatastore()

//  Create  the  ObjectBlog  entry  =  new  Blog("Hergé",  New  Date(),  "Destination  Moon")

//  Insert  object  into  MongoDBds.save(entry);

Page 26: MongoDB Berlin Schema Design

Common Patterns

Page 27: MongoDB Berlin Schema Design

Inheritance

Page 28: MongoDB Berlin Schema Design

shapes tableid type area radius length width

1 circle 3.14 1

2 square 4 2

3 rect 10 5 2

Single Table Inheritance - RDBMS

Page 29: MongoDB Berlin Schema Design

Single Table Inheritance - MongoDB>  db.shapes.find()  {  _id:  "1",  type:  "circle",area:  3.14,  radius:  1}  {  _id:  "2",  type:  "square",area:  4,  length:  2}  {  _id:  "3",  type:  "rect",    area:  10,  length:  5,  width:  2}

missing values not stored!

Page 30: MongoDB Berlin Schema Design

Single Table Inheritance - MongoDB>  db.shapes.find()  {  _id:  "1",  type:  "circle",area:  3.14,  radius:  1}  {  _id:  "2",  type:  "square",area:  4,  length:  2}  {  _id:  "3",  type:  "rect",    area:  10,  length:  5,  width:  2}

//  find  shapes  where  radius  >  0  >  db.shapes.find({radius:  {$gt:  0}})

Page 31: MongoDB Berlin Schema Design

Single Table Inheritance - MongoDB>  db.shapes.find()  {  _id:  "1",  type:  "circle",area:  3.14,  radius:  1}  {  _id:  "2",  type:  "square",area:  4,  length:  2}  {  _id:  "3",  type:  "rect",    area:  10,  length:  5,  width:  2}

//  find  shapes  where  radius  >  0  >  db.shapes.find({radius:  {$gt:  0}})

//  create  index>  db.shapes.ensureIndex({radius:  1},  {sparse:true})

index only values present!

Page 32: MongoDB Berlin Schema Design

One to Many

One to Many relationships can specify• degree of association between objects• containment• life-cycle

Page 33: MongoDB Berlin Schema Design

One to Many

- Embedded Array - $slice operator to return subset of comments - some queries harder e.g find latest comments across all blogs

blogs:  {                author  :  "Hergé",        date  :  ISODate("2011-­‐09-­‐18T09:56:06.298Z"),          comments  :  [      {     author  :  "Kyle",     date  :  ISODate("2011-­‐09-­‐19T09:56:06.298Z"),     text  :  "great  book"      }        ]}

Page 34: MongoDB Berlin Schema Design

One to Many

- Normalized (2 collections) - most flexible - more queries

blogs:  {  _id:  1000,                        author:  "Hergé",                  date:  ISODate("2011-­‐09-­‐18T09:56:06.298Z"),                    comments:  [                                    {comment  :  1)}                                      ]}

comments  :  {  _id  :  1,                          blog:  1000,                          author  :  "Kyle",            date  :  ISODate("2011-­‐09-­‐19T09:56:06.298Z")}

>  blog  =  db.blogs.find({text:  "Destination  Moon"});>  db.comments.find({blog:  blog._id});

Page 35: MongoDB Berlin Schema Design

Linking versus Embedding

• When should I embed?• When should I link?

Page 36: MongoDB Berlin Schema Design

Activity Stream - Embedded

//  users  -­‐  one  doc  per  user  with  all  tweets{    _id:      "alvin",        email:  "[email protected]",      tweets:  [    {     user:    "bob",     tweet:  "20111209-­‐1231",     text:    "Best  Tweet  Ever!"    }      ]}

Page 37: MongoDB Berlin Schema Design

Activity Stream - Linking

//  users  -­‐  one  doc  per  user    {    _id:      "alvin",                    email:  "[email protected]"    }

//  tweets  -­‐  one  doc  per  user  per  tweet    {                  user:    "bob",      tweet:  "20111209-­‐1231",      text:    "Best  Tweet  Ever!"    }    

Page 38: MongoDB Berlin Schema Design

Embedding

• Great for read performance

• One seek to load entire object

• One roundtrip to database

• Writes can be slow if adding to objects all the time

• Should you embed tweets?

Page 39: MongoDB Berlin Schema Design

Activity Stream - Buckets//  tweets  :  one  doc  per  user  per  day

     {            _id:  "alvin-­‐20111209",            email:  "[email protected]",            tweets:  [                  {  user:    "Bob",              tweet:  "20111209-­‐1231",              text:    "Best  Tweet  Ever!"  }  ,                {  author:  "Joe",              date:      "May  27  2011",              text:      "Stuck  in  traffic  (again)"  }              ]   }    

Page 40: MongoDB Berlin Schema Design

Adding a Tweet

tweet  =  {  user:    "Bob",              tweet:  "20111209-­‐1231",              text:    "Best  Tweet  Ever!"  }

db.tweets.update(  {  _id  :  "alvin-­‐20111209"  },                                      {  $push  :  {  tweets  :  tweet  }  );

Page 41: MongoDB Berlin Schema Design

Deleting a Tweet

db.tweets.update(      {  _id:  "alvin-­‐20111209"  },        {  $pull:  {  tweets:  {  tweet:  "20111209-­‐1231"    }  })

Page 42: MongoDB Berlin Schema Design

Many - Many

Example: - Product can be in many categories- Category can have many products

Page 43: MongoDB Berlin Schema Design

products:      {  _id:  10,          name:  "Destination  Moon",          category_ids:  [  20,  30  ]  }    

Many - Many

Page 44: MongoDB Berlin Schema Design

products:      {  _id:  10,          name:  "Destination  Moon",          category_ids:  [  20,  30  ]  }    categories:      {  _id:  20,            name:  "adventure",            product_ids:  [  10,  11,  12  ]  }

categories:      {  _id:  21,            name:  "movie",            product_ids:  [  10  ]  }

Many - Many

Page 45: MongoDB Berlin Schema Design

products:      {  _id:  10,          name:  "Destination  Moon",          category_ids:  [  20,  30  ]  }    categories:      {  _id:  20,            name:  "adventure",            product_ids:  [  10,  11,  12  ]  }

categories:      {  _id:  21,            name:  "movie",            product_ids:  [  10  ]  }

//All  categories  for  a  given  product>  db.categories.find({product_ids:  10})

Many - Many

Page 46: MongoDB Berlin Schema Design

products:      {  _id:  10,          name:  "Destination  Moon",          category_ids:  [  20,  30  ]  }    categories:      {  _id:  20,            name:  "adventure"}

Alternative

Page 47: MongoDB Berlin Schema Design

products:      {  _id:  10,          name:  "Destination  Moon",          category_ids:  [  20,  30  ]  }    categories:      {  _id:  20,            name:  "adventure"}

//  All  products  for  a  given  category>  db.products.find({category_ids:  20)})  

Alternative

Page 48: MongoDB Berlin Schema Design

products:      {  _id:  10,          name:  "Destination  Moon",          category_ids:  [  20,  30  ]  }    categories:      {  _id:  20,            name:  "adventure"}

//  All  products  for  a  given  category>  db.products.find({category_ids:  20)})  

//  All  categories  for  a  given  productproduct    =  db.products.find(_id  :  some_id)>  db.categories.find({_id  :  {$in  :  product.category_ids}})  

Alternative

Page 49: MongoDB Berlin Schema Design

Trees

Hierarchical information

   

Page 50: MongoDB Berlin Schema Design

Trees

Full Tree in Document

{  comments:  [          {  author:  “Kyle”,  text:  “...”,                replies:  [                                            {author:  “James”,  text:  “...”,                                              replies:  []}                ]}    ]}

Pros: Single Document, Performance, Intuitive

Cons: Hard to search, Partial Results, 16MB limit

   

Page 51: MongoDB Berlin Schema Design

Array of Ancestors

- Store all Ancestors of a node    {  _id:  "a"  }    {  _id:  "b",  thread:  [  "a"  ],            replyTo:  "a"  }    {  _id:  "c",  thread:  [  "a",  "b"  ],  replyTo:  "b"  }    {  _id:  "d",  thread:  [  "a",  "b"  ],  replyTo:  "b"  }    {  _id:  "e",  thread:  [  "a"  ],            replyTo:  "a"  }    {  _id:  "f",  thread:  [  "a",  "e"  ],  replyTo:  "e"  }

//  find  all  threads  where  "b"  is  in

>  db.msg_tree.find({thread:  "b"})

A B C

DE

F

Page 52: MongoDB Berlin Schema Design

Array of Ancestors

- Store all Ancestors of a node    {  _id:  "a"  }    {  _id:  "b",  thread:  [  "a"  ],            replyTo:  "a"  }    {  _id:  "c",  thread:  [  "a",  "b"  ],  replyTo:  "b"  }    {  _id:  "d",  thread:  [  "a",  "b"  ],  replyTo:  "b"  }    {  _id:  "e",  thread:  [  "a"  ],            replyTo:  "a"  }    {  _id:  "f",  thread:  [  "a",  "e"  ],  replyTo:  "e"  }

//  find  all  threads  where  "b"  is  in

>  db.msg_tree.find({thread:  "b"})

//  find  replies  to  "e"

>  db.msg_tree.find({replyTo:  "e"})

A B C

DE

F

Page 53: MongoDB Berlin Schema Design

Array of Ancestors

- Store all Ancestors of a node    {  _id:  "a"  }    {  _id:  "b",  thread:  [  "a"  ],            replyTo:  "a"  }    {  _id:  "c",  thread:  [  "a",  "b"  ],  replyTo:  "b"  }    {  _id:  "d",  thread:  [  "a",  "b"  ],  replyTo:  "b"  }    {  _id:  "e",  thread:  [  "a"  ],            replyTo:  "a"  }    {  _id:  "f",  thread:  [  "a",  "e"  ],  replyTo:  "e"  }

//  find  all  threads  where  "b"  is  in

>  db.msg_tree.find({thread:  "b"})

//  find  replies  to  "e"

>  db.msg_tree.find({replyTo:  "e"})

//  find  history  of  "f">  threads  =  db.msg_tree.findOne(  {_id:"f"}  ).thread>  db.msg_tree.find(  {  _id:  {  $in  :  threads  }  )

A B C

DE

F

Page 54: MongoDB Berlin Schema Design

Trees as Paths

Store hierarchy as a path expression- Separate each node by a delimiter, e.g. “/”- Use text search for find parts of a tree

{  comments:  [          {  author:  "Kyle",  text:  "initial  post",                path:  ""  },          {  author:  "Jim",    text:  "jim’s  comment",              path:  "jim"  },          {  author:  "Kyle",  text:  "Kyle’s  reply  to  Jim",              path  :  "jim/kyle"}  ]  }

//  Find  the  conversations  Jim  was  part  of  >  db.posts.find({path:  /^jim/})

Page 55: MongoDB Berlin Schema Design

Queue

• Need to maintain order and state• Ensure that updates are atomic

     db.jobs.save(      {  inprogress:  false,          priority:  1,        ...      });

//  find  highest  priority  job  and  mark  as  in-­‐progressjob  =  db.jobs.findAndModify({                              query:    {inprogress:  false},                              sort:      {priority:  -­‐1},                                update:  {$set:  {inprogress:  true,                                                                started:  new  Date()}},                              new:  true})    

Page 56: MongoDB Berlin Schema Design

Queue

• Need to maintain order and state• Ensure that updates are atomic

     db.jobs.save(      {  inprogress:  false,          priority:  1,        ...      });

//  find  highest  priority  job  and  mark  as  in-­‐progressjob  =  db.jobs.findAndModify({                              query:    {inprogress:  false},                              sort:      {priority:  -­‐1},                                update:  {$set:  {inprogress:  true,                                                                started:  new  Date()}},                              new:  true})    

Page 57: MongoDB Berlin Schema Design

Queue

     {  inprogress:  true,          priority:  1,            started:  ISODate("2011-­‐09-­‐18T09:56:06.298Z")      ...      }

updated

added

Page 58: MongoDB Berlin Schema Design

Inventory

• User has a number of "votes" they can use• A finite stock that you can "sell"• A resource that can be "provisioned"

Page 59: MongoDB Berlin Schema Design

Inventory

 //  Number  of  votes  and  who  user  voted  for  {  _id:      "alvin",      votes:  42,      voted_for:  []  }

 //  Subtract  a  vote  and  add  the  blog  voted  for  db.user.update(                      {  _id:  "alvin",                            votes  :  {  $gt  :  0},                          voted_for:  {$ne:  "Destination  Moon"  },                        {  "$push":  {voted_for:  "Destination  Moon"},                          "$inc":    {votes:  -­‐1}})                                    

Page 60: MongoDB Berlin Schema Design

Summary

Schema design is different in MongoDB

Basic data design principals stay the same

Focus on how the application manipulates data

Rapidly evolve schema to meet your requirements

Enjoy your new freedom, use it wisely :-)

Page 61: MongoDB Berlin Schema Design

@mongodb

conferences,  appearances,  and  meetupshttp://www.10gen.com/events

http://bit.ly/mongo>  Facebook                    |                  Twitter                  |                  LinkedIn

http://linkd.in/joinmongo

download at mongodb.org

[email protected]