schema design by gary murakami
DESCRIPTION
Schema Design by Gary MurakamiTRANSCRIPT
![Page 1: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/1.jpg)
Lead Engineer / Evangelist
Gary J. Murakami, Ph.D.
#MongoDB
Schema Design
![Page 2: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/2.jpg)
Schema Design – Gary Murakami
![Page 3: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/3.jpg)
Schema Design – Gary Murakami
Chess 4.5 (Northwestern University)
Larry Atkin & Dave Slate
chessprogramming.wikispaces.com
![Page 4: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/4.jpg)
Schema Design – Gary Murakami
Agenda
• What is a Record?
• Core Concepts
• What is an Entity?
• Associating Entities
• General Recommendations
• Questions
![Page 5: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/5.jpg)
Schema Design – Gary Murakami
All application development isSchema Design
![Page 6: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/6.jpg)
Schema Design – Gary Murakami
Success comes fromProper Data Structure
![Page 7: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/7.jpg)
What is a Record?
![Page 8: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/8.jpg)
Schema Design – Gary Murakami
Key → Value
• One-dimensional
• Single value is a blob
• Query on key only
• No schema
• Value cannot be updated, only replaced
Key Blob
![Page 9: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/9.jpg)
Schema Design – Gary Murakami
Relational
• Two-dimensional (tuples)
• Each field is a single value
• Query on any field
• Very structured schema (table)
• In-place updates *
• Normalization requires many tables, joins, indexes, and poor data locality and performance
PrimaryKey
![Page 10: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/10.jpg)
Schema Design – Gary Murakami
Document• N-dimensional
• Each field can contain 0, 1, many, or embedded values
• Query on any field & level
• Flexible schema
• Inline updates *
• Embedding related data has optimal data locality, requires fewer indexes, has better performance
_id
![Page 11: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/11.jpg)
Core Concepts
![Page 12: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/12.jpg)
Schema Design – Gary Murakami
Traditional Schema DesignFocus on data storage
![Page 13: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/13.jpg)
Schema Design – Gary Murakami
Document Schema DesignFocus on data use
![Page 14: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/14.jpg)
Schema Design – Gary Murakami
Another way to think about itTraditional:What answers do I have?
Document:What questions do I have?
![Page 15: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/15.jpg)
Schema Design – Gary Murakami
Three Building Blocks ofDocument Schema Design
![Page 16: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/16.jpg)
Schema Design – Gary Murakami
1 – Flexibility
• Choices for schema design
• Each record can have different fields
• Field names consistent for programming
• Common structure can be enforced by application
• Easy to evolve as needed
![Page 17: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/17.jpg)
Schema Design – Gary Murakami
2 – ArraysMultiple Values per Field
• Each field can be:– Absent– Set to null– Set to a single value– Set to an array of many values
• Query for any matching value– Can be indexed and each value in the array is in
the index
![Page 18: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/18.jpg)
Schema Design – Gary Murakami
3 - Embedded Documents• Any value can be a document
• Nested documents provide structure
• Query any field at any level– Can be indexed
![Page 19: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/19.jpg)
Schema Design – Gary Murakami
Belle and Endgame tablebases
Play chess with God – Ken Thompson
![Page 20: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/20.jpg)
What is an Entity?
![Page 21: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/21.jpg)
Schema Design – Gary Murakami
An Entity
• Object in your model
• Associations with other entities
Referencing (Relational)
Embedding (Document)
has_one embeds_one
belongs_to embedded_in
has_many embeds_many
has_and_belongs_to_manyMongoDB has both referencing and embedding for
universal coverage
![Page 22: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/22.jpg)
Schema Design – Gary Murakami
Let's model something togetherHow about a business card?
![Page 23: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/23.jpg)
Business Card
Schema Design – Gary Murakami
![Page 24: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/24.jpg)
Contacts
{ “_id”: 2, “name”: “Steven Jobs”, “title”: “VP, New Product Development”, “company”: “Apple Computer”, “phone”: “408-996-1010”, “address_id”: 1}
Referencing
Schema Design – Gary Murakami
Addresses
{“_id”: 1,“street”: “10260 Bandley
Dr”,“city”: “Cupertino”,“state”: “CA”,“zip_code”: ”95014”,“country”: “USA”
}
![Page 25: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/25.jpg)
Contacts
{ “_id”: 2, “name”: “Steven Jobs”, “title”: “VP, New Product Development”, “company”: “Apple Computer”, “address”: {
“street”: “10260 Bandley Dr”,“city”: “Cupertino”,“state”: “CA”,“zip_code”: ”95014”,“country”: “USA”
}, “phone”: “408-996-1010”}
Embedding
Schema Design – Gary Murakami
![Page 26: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/26.jpg)
Schema Design – Gary Murakami
Relational Schema
Contact
• name• compan
y• title• phone
Address
• street• city• state• zip_cod
e
![Page 27: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/27.jpg)
Contact
• name• company• adress
• Street• City• State• Zip
• title• phone
• address• street• city• State• zip_cod
e
Schema Design – Gary Murakami
Document Schema
![Page 28: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/28.jpg)
Schema Design – Gary Murakami
How are they different? Why?
Contact
• name• compan
y• title• phone
Address
• street• city• state• zip_cod
e
Contact
• name• company• adress
• Street• City• State• Zip
• title• phone
• address• street• city• state• zip_cod
e
![Page 29: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/29.jpg)
{ “name”: “Steven Jobs”, “title”: “VP, New Product Development”, “company”: “Apple Computer”, “address”: {
“street”: “10260 Bandley Dr”,“city”: “Cupertino”,“state”: “CA”,“zip_code”: ”95014”
}, “phone”: “408-996-1010”}
Schema Flexibility
Schema Design – Gary Murakami
{ “name”: “Larry Page”, “url”: “http://google.com/”, “title”: “CEO”, “company”: “Google!”, “email”: “[email protected]”, “address”: { “street”: “555 Bryant, #106”, “city”: “Palo Alto”, “state”: “CA”, “zip_code”: “94301” } “phone”: “650-618-1499”, “fax”: “650-330-0100”}
![Page 30: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/30.jpg)
Schema Design – Gary Murakami
Longest “Database Endgame” Mate
• Augment schema with meta data– Distance to mate (DTM)– Distance to conversion (DTC)
• Retrograde analysis of DB
• Longest checkmate– 6 piece – 262 moves, KRNKNN– 7 piece – 517 moves, so far• Completion by 2015
![Page 31: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/31.jpg)
Example
![Page 32: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/32.jpg)
Schema Design – Gary Murakami
Let’s Look at anAddress Book
![Page 33: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/33.jpg)
Schema Design – Gary Murakami
Address Book
• What questions do I have?
• What are my entities?
• What are my associations?
![Page 34: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/34.jpg)
Schema Design – Gary Murakami
Address Book Entity-Relationship
Contacts• name• company• title
Addresses
• type• street• city• state• zip_code
Phones• type• number
Emails• type• address
Thumbnails
• mime_type• data
Portraits• mime_type• data
Groups• name
N
1
N
1
N
N
N
1
1
1
11
Twitters• name• location• web• bio
1
1
![Page 35: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/35.jpg)
Associating Entities
![Page 36: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/36.jpg)
Schema Design – Gary Murakami
One to One
Contacts• name• company• title
Addresses
• type• street• city• state• zip_code
Phones• type• number
Emails• type• address
Thumbnails
• mime_type• data
Portraits• mime_type• data
Groups• name
N
1
N
1
N
N
N
1
1
1
11
Twitters• name• location• web• bio
1
1
![Page 37: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/37.jpg)
Schema Design – Gary Murakami
One to OneSchema Design Choices
contact• twitter_id
twitter1 1
contact twitter• contact_id1 1
Redundant to track relationship on both sides • Both references must be updated for consistency
• Saves a fetch if no twitter
Contact• twitter
twitter 1
![Page 38: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/38.jpg)
Schema Design – Gary Murakami
One to OneGeneral Recommendation
• Full contact info all at once– Contact embeds twitter• Parent-child relationship
– “contains”
• No additional data duplication• Can query or index on embedded field
– e.g., “twitter.name”
Contact• twitter
twitter 1
![Page 39: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/39.jpg)
Schema Design – Gary Murakami
One to Many
Contacts• name• company• title
Addresses
• type• street• city• state• zip_code
Phones• type• number
Emails• type• address
Thumbnails
• mime_type• data
Portraits• mime_type• data
Groups• name
N
1
N
1
N
N
N
1
1
1
11
Twitters• name• location• web• bio
1
1
![Page 40: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/40.jpg)
Schema Design – Gary Murakami
One to ManySchema Design Choices
contact• phone_ids: [
]phone1 N
contact phone• contact_id1 N
Redundant to track relationship on both sides • Both references must be updated for consistency
• Not possible in relational DBs• Saves a fetch if no phones
Contact• phones
phoneN
![Page 41: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/41.jpg)
Schema Design – Gary Murakami
One to ManyGeneral Recommendation
• Full contact info all at once– Contact embeds multiple phones• Parent-children relationship
– “contains”
• No additional data duplication• Can query or index on any field
– e.g., { “phones.type”: “mobile” }
Contact• phones
phoneN
![Page 42: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/42.jpg)
Schema Design – Gary Murakami
Many to Many
Contacts• name• company• title
Addresses
• type• street• city• state• zip_code
Phones• type• number
Emails• type• address
Thumbnails
• mime_type• data
Portraits• mime_type• data
Groups• name
N
1
N
1
N
N
N
1
1
1
11
Twitters• name• location• web• bio
1
1
![Page 43: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/43.jpg)
Schema Design – Gary Murakami
Many to ManyTraditional Relational Association
Join table
Contacts• name• company• title• phone
Groups• name
GroupContacts
• group_id• contact_idX
Use arrays instead
![Page 44: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/44.jpg)
Schema Design – Gary Murakami
Many to ManySchema Design Choices
group• contact_ids:
[ ]contactN N
groupcontact• group_ids:
[ ]N N
Redundant to track relationship on both sides • Both references must be
updated for consistency
Redundant to track relationship on both sides • Duplicated data must be
updated for consistency
group• contacts
contactN
contact• groups
group N
![Page 45: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/45.jpg)
Schema Design – Gary Murakami
Many to ManyGeneral Recommendation
• Depends on use case1. Simple address book• Contact references groups
2. Corporate email groups• Group embeds contacts for performance
groupcontact• group_ids:
[ ]N N
![Page 46: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/46.jpg)
Schema Design – Gary Murakami
Contacts• name• company• title
addresses• type• street• city• state• zip_code
phones• type• number
emails• type• address
thumbnail• mime_type• data
Portraits• mime_type• data
Groups• name
N
1
N
1
twitter• name• location• web• bio
N
N
N
1
1
Document model - holistic and efficient representation
![Page 47: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/47.jpg)
{“name” : “Gary J. Murakami, Ph.D.”,“company” : “10gen (the MongoDB) company”,“title” : “Lead Engineer and Ruby Evangelist”,“twitter” : {
“name” : “GaryMurakami”, “location” : “New Providence, NJ”,“web” : “http://www.nobell.org”
},“portrait_id” : 1,“addresses” : [
{ “type” : “work”, “street” : ”229 W 43rd St.”, “city” : “New York”, “zip_code” : “10036” }],“phones” : [
{ “type” : “work”, “number” : “1-866-237-8815 x8015” }],“emails” : [
{ “type” : “work”, “address” : “[email protected]” },{ “type” : “home”, “address” : “[email protected]” }
]}
Contact document example
Schema Design – Gary Murakami
![Page 48: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/48.jpg)
Schema Design – Gary Murakami
Can We Solve Chess One Day?
• Chess tablebase problem– Chess programs often play worse– Search is not localized, poor cache performance,
seeks– Working set too large for memory
• Endgame database size – big data– 5 piece: 7 GB compressed 75%• 157 MB Shredderbase – 1000x• 441 MB Shredderbase – 10,000x
– 6 piece: 1.2 TB compressed– 7 piece: 70 TB estimated by 2015
![Page 49: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/49.jpg)
Schema Design – Gary Murakami
Working Set
1. To reduce the working set– reference less-used data instead of embedding• extract into referenced child document
– reference bulk data, e.g., portrait
2. To increase resources – read from secondaries in a replica set– use sharding
![Page 50: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/50.jpg)
General Recommendations
![Page 51: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/51.jpg)
Schema Design – Gary Murakami
Embedding over Referencing • Embed
– When “one” or “many” objects are viewed with their parent
– For performance– For atomicity
• Reference– When you need more scaling: max document size
is 16MB– For easy “many to many” associations– For smaller parent documents and working set
![Page 52: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/52.jpg)
Schema Design – Gary Murakami
Legacy Migration
1. Copy existing schema & some data to MongoDB
2. Iterate schema design1. Measure performance and find bottlenecks2. Denormalize by embedding
1. one to one associations first2. one to many associations next3. many to many associations last
3. Examine, measure and analyze, review concerns, scaling
![Page 53: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/53.jpg)
Schema Design – Gary Murakami
New Application
1. Focus on your application 1. Requests2. Responses3. Business-domain model objects / data structures
2. Then persist language object data to MongoDB1. Collections2. Associations3. Refactor for optimization and add indices
![Page 54: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/54.jpg)
Schema Design – Gary Murakami
It’s All About Your Application
• Your schema is the impedance matcher– Design choices: normalize/denormalize,
reference/embed– Melds programming with MongoDB for best of
both– Flexible for development and change
• Programs+Databases = (Big) Data Applications
![Page 55: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/55.jpg)
Schema Design – Gary Murakami
It’s All About Your Application
• Your schema is the impedance matcher– Design choices: normalize/denormalize,
reference/embed– Melds programming with MongoDB for best of
both– Flexible for development and change
• Programs×MongoDB = Great Big Data Applications
• Play chess with God
![Page 56: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/56.jpg)
Schema Design – Gary Murakami
It’s All About Your Application
• Your schema is the impedance matcher– Design choices: normalize/denormalize,
reference/embed– Melds programming with MongoDB for best of
both– Flexible for development and change
• Programs×MongoDB = Great Big Data Applications
• Play music with God – AAC
![Page 57: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/57.jpg)
![Page 58: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/58.jpg)
Lead Engineer / Evangelist
Gary J. Murakami, Ph.D.
#MongoDB
Questions?
"His pattern indicatestwo-dimensional
thinking.”- Spock
Star Trek II: The Wrath of Khan
www.3dchessfederation.com
![Page 59: Schema Design by Gary Murakami](https://reader038.vdocuments.net/reader038/viewer/2022102815/5568027ed8b42a242a8b48a2/html5/thumbnails/59.jpg)
Thank you so much to our community who made An Evening with MongoDB Minneapolis possible:
• David Hussman• Josh Kennedy• Matthew Chimento• Jeffrey Lemmerman• Dan Chamberlain • Christopher Rueber • Erin Newkirk
Thank you DevJam for hosting our event!