nosql now 2012: mongodb use cases
Post on 25-Jun-2015
3.703 Views
Preview:
DESCRIPTION
TRANSCRIPT
MongoDB and NoSQL Use Cases
Dwight Merriman, 10gen
Trends
• More data• Complex data• Cloud computing + Computer architecture
trends ->– Many commodity-type servers rather than one large
server; commodity-type storage
• Fast application start->deploy expectations ->– Agile software development methodologies /
iteration– Service oriented architectures
Wants
• Horizontal scaling• Ability to store complex data and deal with the
malleability of real world schemas without pain• Works with my (object-oriented) programming
language without friction• Works with my frequent release cycles
(iteration) without friction• High single server performance• Cloud-friendly
Wants
• Horizontal scaling• Ability to store complex data and deal with the
malleability of real world schemas without pain• Works with my (object-oriented) programming
language without friction• Works with my frequent release cycles
(iteration) without friction• High single server performance• Cloud-friendly
We Need
A way to scale out
A new data model
Approach
• A new data model gives us a way to scale, and a way to solve our development wants
• Goals for data model:– Maintain data separation from code– Low friction and mapping cost our programming
language– Malleability for adapting to constant changes of the
real world– Ability to deal with polymorphic data
Approach
• Rich documents + Partitioning– Each document lives on one shard
(partitioning)
• The catch: – No complex transactions
Approach
• Rich documents + Partitioning– Each document lives on one shard
(partitioning)
• The catch: – No complex transactions
Wants
• Horizontal scaling• Ability to store complex data and deal with the
malleability of real world schemas without pain• Works with my (object-oriented) programming
language without friction• Works with my frequent release cycles
(iteration) without friction• High single server performance• Cloud-friendly
• Caveats / trade-offs:– No complex transactions
Thus implyinguse cases…
When should you consider using MongoDB?
• You find yourself coding around database performance issues – for example adding lots of caching.
• You are storing data in flat files.• You are batch processing yet you need real-time.• You are doing agile development, e.g., Scrum.• Your data is complex to model in a relational db. e.g. a complex
derivative security; electronic health records, ...• Your project is late :-)• You are forced to use expensive SANs, proprietary servers, or
proprietary networks for your existing db.• You are deploying to a public or private cloud.
When should you use something else?
• Problems requiring SQL. • Systems with a heavy emphasis on complex transactions such as banking
systems and accounting. • Traditional Non-Realtime Data Warehousing (sometimes). Traditional
relational data warehouses and variants (columnar relational) are well suited for certain business intelligence problems – especially if you need SQL for your client tool (e.g. MicroStrategy). Exceptions where MongoDB is good are:
• cases where the analytics are realtime• cases where the data very complicated to model in relational• when the data volume is huge• when the source data is already in a mongo database
TodayLast 10 years
RDBMSData
Warehouse
NoSQLDB
RDBMS
Data Warehouse
The beginning
RDBMS
Example users
User Data Management High Volume Data Feeds
w
Content Management Operational Intelligence Meta Data Management
High Volume Data Feeds
• More machines, more sensors, more data
• Variably structured
Machine Generated
Data
• High frequency tradingStock Market Data
• Multiple sources of data• Each changes their format
constantly
Social Media Firehose
Operational Intelligence
• Large volume of state about users• Very strict latency requirementsAd Targeting
• Expose report data to millions of customers• Report on large volumes of data• Reports that update in real time
Real time dashboards
• What are people talking about? Social Media Monitoring
Intuit hosts more than 500,000 websites
wanted to collect and analyze data to recommend conversion and lead generation improvements to customers.
With 10 years worth of user data, it took several days to process the information using a relational database.
Problem
Intuit hosts more than 500,000 websites
wanted to collect and analyze data to recommend conversion and lead generation improvements to customers.
With 10 years worth of user data, it took several days to process the information using a relational database.
Why MongoDB In one week Intuit was able to
become proficient in MongoDB development
Developed application features more quickly for MongoDB than for relational databases
MongoDB was 2.5 times faster than MySQL
Impact
Intuit relies on a MongoDB-powered real-time analytics tool for small businesses to derive interesting and actionable patterns from their customers’ website traffic
We did a prototype for one week, and within one week we had made big progress. Very big progress. It was so amazing that we decided, “Let’s go with this.” -Nirmala Ranganathan, Intuit
Marketing Personalization
1
2
3
See Ad
See Ad
4
Click
Convert
{ cookie_id: “1234512413243”, advertiser:{ apple: { actions: [ { impression: ‘ad1’, time: 123 }, { impression: ‘ad2’, time: 232 }, { click: ‘ad2’, time: 235 }, { add_to_cart: ‘laptop’, sku: ‘asdf23f’, time: 254 }, { purchase: ‘laptop’, time: 354 } ] } }}
Rich profiles collecting multiple complex actions
Scale out to support high throughput of activities tracked
Indexing and querying to support matching, frequency
capping
Dynamic schemas make it easy to track
vendor specific attributes
Meta Data Management
• Meta data about artifacts • Content in the library
Data Archiving
• Have data sources that you don’t have access to • Stores meta-data on those stores and figure out
which ones have the content
Information discovery
• Retina scans• Finger printsBiometrics
Meta data
{ ISBN: “00e8da9b”, type: “Book”, country: “Egypt”, title: “Ancient Egypt”}
{ type: “Artefact”, medium: “Ceramic”, country: “Egypt”, year: “3000 BC”}
Flexible data model for similar, but
different objects
Indexing and rich query API for easy
searching and sortingdb.archives.
find({ “country”: “Egypt” });
Managing 20TB of data (six billion images for millions of customers) partitioning by function.
Home-grown key value store on top of their Oracle database offered sub-par performance
Codebase for this hybrid store became hard to manage
High licensing, HW costs
Problem
JSON-based data structure Provided Shutterfly with an
agile, high performance, scalable solution at a low cost.
Works seamlessly with Shutterfly’s services-based architecture
Why MongoDB 500% cost reduction and 900%
performance improvement compared to previous Oracle implementation
Accelerated time-to-market for nearly a dozen projects on MongoDB
Improved Performance by reducing average latency for inserts from 400ms to 2ms.
Impact
Shutterfly uses MongoDB to safeguard more than six billion images for millions of customers in the form of photos and videos, and turn everyday pictures into keepsakes
The “really killer reason” for using MongoDB is its rich JSON-based data structure, which offers Shutterfly an agile approach to develop software. With MongoDB, the Shutterfly team can quickly develop and deploy new applications, especially Web 2.0 and social features. -Kenny Gorman, Director of Data Services
Content Management
• Comments and user generated content
• Personalization of content, layoutNews Site
• Generate layout on the fly for each device that connects
• No need to cache static pages
Multi-Device rendering
• Store large objects• Simple modeling of metadataSharing
Content Management
{ camera: “Nikon d4”, location: [ -122.418333, 37.775 ] }
{ camera: “Canon 5d mkII”, people: [ “Jim”, “Carol” ], taken_on: ISODate("2012-03-07T18:32:35.002Z")}
{ origin: “facebook.com/photos/xwdf23fsdf”, license: “Creative Commons CC0”, size: { dimensions: [ 124, 52 ], units: “pixels” }}
Flexible data model for similar, but
different objects
Horizontal scalability for large data sets
Geo spatial indexing for location based
searchesGridFS for large object storage
Analyze a staggering amount of data for a system build on continuous stream of high-quality text pulled from online sources
Adding too much data too quickly resulted in outages; tables locked for tens of seconds during inserts
Initially launched entirely on MySQL but quickly hit performance road blocks
Problem
Life with MongoDB has been good for Wordnik. Our code is faster, more flexible and dramatically smaller. Since we don’t spend time worrying about the database, we can spend more time writing code for our application. -Tony Tam, Vice President of Engineering and Technical Co-founder
Migrated 5 billion records in a single day with zero downtime
MongoDB powers every website requests: 20m API calls per day
Ability to eliminated memcached layer, creating a simplified system that required fewer resources and was less prone to error.
Why MongoDB Reduced code by 75%
compared to MySQL Fetch time cut from 400ms to
60ms Sustained insert speed of 8k
words per second, with frequent bursts of up to 50k per second
Significant cost savings and 15% reduction in servers
Impact
Wordnik uses MongoDB as the foundation for its “live” dictionary that stores its entire text corpus – 3.5T of data in 20 billion records
www.10gen.com
www.mongodb.org
Dwight Merriman, 10gen
top related