search with a key-value store

Post on 12-Jan-2016

35 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Search with a Key-Value Store. Intro to NoSQL. Key-value store Schemaless Distributed Eventually Consistent. Key-Value. Single unique key for each value in the database Extremely fast look-up Easy distribution (no such thing as joins). Schemaless. Critical for extremely large data sets - PowerPoint PPT Presentation

TRANSCRIPT

Search with a Key-Value Store

Intro to NoSQL

•Key-value store

•Schemaless

•Distributed

•Eventually Consistent

Key-Value

•Single unique key for each value in the database

•Extremely fast look-up

•Easy distribution (no such thing as joins)

Schemaless

•Critical for extremely large data sets

•No alter table commands, each value has no pre-defined fields

Distributed

•Data set is designed to be shared across multiple machines

•Typically makes use of commodity servers with enough RAM to keep the entire data set in memory

Eventually Consistent

•Replica nodes are not notified of changes before a success response is returned to the client

•Makes NoSQL problematic for highly sensitive transactions (finance, etc)

Database Design in NoSQL

•Denormalization is your friend

•Think of collections as views on a data set that

A News Site Using SQL

Users

id

user_name

birthday

Stories

id

date

headline

content

Comment

id

story_id

user_id

content

Loading a Story with SQL

SELECT * FROM stories

SELECT * FROM comments LEFT JOIN users ON users.id = comments.user_id

LEFT JOIN comments children ON children.parent_id = comments.id

WHERE story_id = x

Redesigned in a NoSQL Data Store

Story #dgi3ck

dateheadlinecontent

commentsComment #la529

contentusername

user_image_urluser_idchildren

Comment #mn34icontent

usernameuser_image_url

user_idComment #5bg26content

usernameuser_image_url

user_idchildren

Loading a Story with NoSQL

Stories::get(dgi3ck)

Some Design Considerations

•What is the context in which we will access this data?

•What data do we need to access outside the of this context?

•How often does the data change?

Embedded Data

•NoSQL can support foreign keys

•Some data is more appropriately stored “embedded” in a parent context

•E.g. Comments are rarely (if ever) accessed outside of their parent Story

Cached Data

•Data from an object that needs to be accessed outside of the current context can be cached

•Keep in mind that it may need to be updated

•E.g. a user changes his username, Comments can be updated

Several common NoSQL Stores

•Memcached

•BigTable

•SimpleDB

•MongoDB

Why we chose MongoDB

•Auto-sharding and easy setup for distribution

•JavaScript API

•Powerful indexing capabilities

MongoDB Libraries

•ORM: mongo_mapper

• https://github.com/jnunemaker/mongomapper

•Underlying Connection: mongo

• https://github.com/mongodb/mongo-ruby-driver

•BSON support: bson_ext

• http://rubygems.org/gems/bson_ext

Lifebooker’s Availability Search

• Searches across Services

• Filters

• Time/Date

• Geographical Zone

• Service Category

• Practitioner Gender

• Concurrent Availability

• (and several more)

Services, Discounts and Practitioners

•Services are offered by Providers

•Providers have Practitioners (Employees)

•Discounts are applied to Providers for a Service in a given time

Modeling this Data in MongoDB

Embedding with MongoMapper

Indexing and Searching

•Mongo offers powerful indexing capabilities

•Arrays are “first-class citizens”

•Complex indices allow for great performance

Creating Meta-Data

•With complex data structures, creating meta-data before_save will allow you to make that data easily searchable

•E.g. the maximum discount on a given day for a service

Creating Indices

Querying

•Uses DataMapper/Arel Syntax

•Chains conditions, ordering and offset

Filtering Complex Data Structures

•MongoDB offers a JavaScript API for MapReduce

•Map - transform and filter data

•Reduce - combine multiple rows into a single record

A Simple Use-Case

Using MapReduce to Filter

Filter

The Results•Scheduled to go live within 2 weeks

•With sharding/distribution, tests show almost no dip in response time with more than 10x the current data set

•20x faster than MySQL implementation

•100ms vs 2000ms (or more)

top related