search with a key-value store
DESCRIPTION
Search with a Key-Value Store. Intro to NoSQL. Key-value store Schemaless Distributed Eventually Consistent. Key-Value. Single unique key for each value in the database Extremely fast look-up Easy distribution (no such thing as joins). Schemaless. Critical for extremely large data sets - PowerPoint PPT PresentationTRANSCRIPT
Search with a Key-Value Store
Intro to NoSQL
•Key-value store
•Schemaless
•Distributed
•Eventually Consistent
Key-Value
•Single unique key for each value in the database
•Extremely fast look-up
•Easy distribution (no such thing as joins)
Schemaless
•Critical for extremely large data sets
•No alter table commands, each value has no pre-defined fields
Distributed
•Data set is designed to be shared across multiple machines
•Typically makes use of commodity servers with enough RAM to keep the entire data set in memory
Eventually Consistent
•Replica nodes are not notified of changes before a success response is returned to the client
•Makes NoSQL problematic for highly sensitive transactions (finance, etc)
Database Design in NoSQL
•Denormalization is your friend
•Think of collections as views on a data set that
A News Site Using SQL
Users
id
user_name
birthday
Stories
id
date
headline
content
Comment
id
story_id
user_id
content
Loading a Story with SQL
SELECT * FROM stories
SELECT * FROM comments LEFT JOIN users ON users.id = comments.user_id
LEFT JOIN comments children ON children.parent_id = comments.id
WHERE story_id = x
Redesigned in a NoSQL Data Store
Story #dgi3ck
dateheadlinecontent
commentsComment #la529
contentusername
user_image_urluser_idchildren
Comment #mn34icontent
usernameuser_image_url
user_idComment #5bg26content
usernameuser_image_url
user_idchildren
Loading a Story with NoSQL
Stories::get(dgi3ck)
Some Design Considerations
•What is the context in which we will access this data?
•What data do we need to access outside the of this context?
•How often does the data change?
Embedded Data
•NoSQL can support foreign keys
•Some data is more appropriately stored “embedded” in a parent context
•E.g. Comments are rarely (if ever) accessed outside of their parent Story
Cached Data
•Data from an object that needs to be accessed outside of the current context can be cached
•Keep in mind that it may need to be updated
•E.g. a user changes his username, Comments can be updated
Several common NoSQL Stores
•Memcached
•BigTable
•SimpleDB
•MongoDB
Why we chose MongoDB
•Auto-sharding and easy setup for distribution
•JavaScript API
•Powerful indexing capabilities
MongoDB Libraries
•ORM: mongo_mapper
• https://github.com/jnunemaker/mongomapper
•Underlying Connection: mongo
• https://github.com/mongodb/mongo-ruby-driver
•BSON support: bson_ext
• http://rubygems.org/gems/bson_ext
Lifebooker’s Availability Search
• Searches across Services
• Filters
• Time/Date
• Geographical Zone
• Service Category
• Practitioner Gender
• Concurrent Availability
• (and several more)
Services, Discounts and Practitioners
•Services are offered by Providers
•Providers have Practitioners (Employees)
•Discounts are applied to Providers for a Service in a given time
Modeling this Data in MongoDB
Embedding with MongoMapper
Indexing and Searching
•Mongo offers powerful indexing capabilities
•Arrays are “first-class citizens”
•Complex indices allow for great performance
Creating Meta-Data
•With complex data structures, creating meta-data before_save will allow you to make that data easily searchable
•E.g. the maximum discount on a given day for a service
Creating Indices
Querying
•Uses DataMapper/Arel Syntax
•Chains conditions, ordering and offset
Filtering Complex Data Structures
•MongoDB offers a JavaScript API for MapReduce
•Map - transform and filter data
•Reduce - combine multiple rows into a single record
A Simple Use-Case
Using MapReduce to Filter
Filter
The Results•Scheduled to go live within 2 weeks
•With sharding/distribution, tests show almost no dip in response time with more than 10x the current data set
•20x faster than MySQL implementation
•100ms vs 2000ms (or more)