how mysql and nosql coexist - percona · how mysql and nosql coexist matt yonkovit - percona. ...
TRANSCRIPT
Who Am I
Matt Yonkovit - Principal Consultant Percona➔ MySQL->Sun Veteran➔15+ years Database Experience➔Lots of fun at parties
MySQL focus: scale upLook at the advances made within innodb
•Scalability to 32+cores
•SSD enhancements
•Dealing with mega-sized buffers pools
This fits with hardware•Multi-core
•Lots of memory standard
•Cheap Flash
Industry Says Scale Out
Cloud Computing
•Rise of EC2
•Scale on demand
•Pay for what you need
Big companies run 1000's of servers
Big Data retention drives more scale out
•Retention and Analysis out pace hardware growth
This is part of the cyclical nature of computing, I remember hearing this from Oracle when RAC was first introduced, before that it was time share
Developers want easy and fast
SQL Is Powerful
But SQL can be overly complex for many operations
Sometimes Optimizers in RDBM's do stupid things and developers can do it better
SQL is yet another component a developer needs to learn
Even for Simple SQL operations there is overhead to parse/execute
ORM Sucks
Because developers want to be focused on being super developer ninja's … the often turn to ORM's
ORM's ( like active record ) work great for simple applications, but tend to bork when you have complex mappings
Fast Moving Changes
What data and how that data is stored is changing at a break neck pace
Changes to large databases are hard•Example In a presentation Craigslist said their archive db took 1 monthto alter a table.
•Other alters can still take days or weeks
Missed opportunities
While the MySQL community as a whole has done an awesome job, we did miss a few things:
● Online Table Alters ( Add/Mod column )● Scale out -vs- Scale up● Flexible data types
The Rise of NoSQL
RDBM's of old did not keep up with the demand
Need for fast, efficient data access
Eliminate the pain points
Eliminate the unneeded fluff•Many websites got along well without things like Functions, Stored porcs, full acid, compliance, etc.
•
NoSQL Covers a lot
Key/Value•i.e. memcached, redis
Column Stores•i.e. Hbase
Document Stores•i.e. Mongo,Couch
More “developer” centric interface
Instead of relying on a SQL interface many NoSQL solutions allow develoeprs to stay in code and directly access objects using their programing language of choice
•i.e. mongodb's javascript interface
Allow data to be retrieved in an easily consumable format
•i.e. json, binary object, etc
Scale outRemove the complexities and automate sharding
Make full use of multiple servers for complex tasks
•i.e. map reduce
Ability to add servers on demand
Support for fail-over and replication
Easy to change
Change is a certainty in life... Users will demand it!
Make sure that you are not bound to a rigid structure
•Allow for changes on the fly “flexible schema”
Custom Features
Add support for features missing in MySQL or that solve a specific need
•GridFS (Mongo)
•Super Columns (Cassandra)
•Lists/Sets (Redis)
Too Much FUDA lot of developers like to hate on MySQL
A NoSQL Solution (Like Mongo) does not mean:
•You do not have to think about your data types and “schema” design
•Does not mean add nodes will solve all your issues
•Does not mean you can get sloppy with code
•Will not fully replace a relational db
Bad Code is Bad Code
As complex as you make it
You still have to think with NoSQL•How will this data be used
•Will we have to correlate this data with other data in the system
•Will duplication of data cause issues? Bloat space?
Too Much FUDA lot of people in MySQL hate on NoSQL
Not All “NOSQL” solutions are made the same
•Some of them are durable, or are adding durability features ( i.e. mongo in 1.8+ )
•While in most “NOSQL” more memory = better performance, the same holds true for MySQL
•Not every solution is eventual consistency
Remember MySQL started as “that other technology”, mocked by classic DBA's
Many of these solutions are at similar places where MySQL was 6-7 years ago.
Lessons from NDB
In Many ways MySQL has had a “NoSQL” solution available for years
NDB ( Mysql Cluster ) is at its code a “NoSQL” solution with a SQL wrapper
Many deployments of NDB need to bypass the SQL layer and write directly to the NDB API in order to achieve optimal performance
Parsing SQL Can be slowlibmysql (Akira’s Numbers)
samples % symbol name
748022 7.7355 MYSQLParse(void*)
219702 2.2720 my_pthread_fastmutex_lock
205606 2.1262 make_join_statistics(…)
198234 2.0500 btr_search_guess_on_hash
Sources:http://www.mysqlperformanceblog.com/2011/03/16/where-does-handlersocket-really-save-you-time/http://www.slideshare.net/akirahiguchi/handlersocket-20100629en-5698215
Handler Socket
Developed by DeNA
Direct Access to the Innodb storage engine, bypassing SQL
Key-Value type access
Yoshinori hit 750K QPS, faster then memcached, and 7X faster then Stock SQL
http://yoshinorimatsunobu.blogspot.com/2010/10/using-mysql-as-nosql-story-for.html
Handler Socket Performance
Benchmark by Vadim:http://www.mysqlperformanceblog.com/2010/11/02/handlersocket-on-ssd/
Memcached
Oracle recently got in on the fun, introducing a Memcached interface for Innodb
Key-Value access
Uses libmemcached ( known and used protocol)
Promise the ability to use the distributed hash capabilities of memcached to shard ( not available yet )
Answer:
What's important to you?
Question:Should you use MySQL or Look at NoSQL Solution?
Answer:It depends on
your application.
Questions you should ask
● Do you need transactions?● Can you risk data loss?● What are your performance requirements?● What level of risk is acceptable?● Costs involved?
● Hard costs: Servers, Infrastructure● Soft costs: Developer Time
General MySQL Considerations
●Durability important ( Is some data loss acceptable? )●3rd Party applications●Integration with other RDBMS's●Already invested in SQL?●OLTP type Workloads?●Transactions●Joins
General NoSQL Considerations
SQL Is Overkill
•Simple Key Values?
Document Oriented
•No Standard Form or Consistent data stream
Require CPU Based resources from several machines
Huge amounts of archived data
Rapid Changes to the structures
Possible data inconsistencies
•Eventual consistency by product
Performance?
Performance for both MySQL and NoSQL solutions can vary wildly.
Impacted by:
•Schema/Object design
•Datatypes
•Indexes
•ORM
•Drivers
You can in many cases you can trade performance for reliability/consistency
Space
With Huge Datasets, space can be a premium
Some NoSQL options can take up a lot more space then their MySQL counter parts.
XML Example (no redundant data):•2.5GB in MongoDB
•486MB in Innodb
Some solutions have you duplicate data for performance and simplicity
Verify
I have run into a lot of people who leap before they test
•Asking for trouble
Don't be trapped by legacy problems•If you have bad sql code, its not MySQL's fault ;)
Use specific benchmarks, not generic ones
MongoDB
Nice Json centric data storage
•Do not underestimate
MapReduce capabilities
Better durability in version 1.8+
Built in sharding
Replication
No Native Joins
Trade performance for consistency
Data footprint can be bigger then Mysql in some cases
Redis
Pros:
Fast when fully in memory, but can use “virtual memory”
Supports “clustering”
Supports replication
Super fast
Complex data types like lists
Cons:
Using Virtual Memory prevents using other features
Data loss possible
No Native joins
Key Access only
Cassandra
Pros:
Auto-Sharding of data
Replication
Parallel Processing
Super Columns are interesting
Cons:
Eventual Consistency
Documentation not as deep as other solutions
Speed is by product of adding more nodes
Network Chatty
Durability issues
Hadoop
Pros:
Great with Super Large Datasets (Petabytes)
Extreme Parallelism
Can do complex CPU intensive tasks via map reduce
Cons:
Can be complex
Needs time to produce results
No Native Joins/Indexes
Tokyo Cabinet
Pros:
Can Be Super Fast
Multiple Table Types to support different requirements
Embeddable
Replication
Can use Memcached protocol
Cons:
No Built in Sharding
Durability/Consistency issues
Lack of Documentation
Can Bog down if you run out of memory
Couch
Resistant to corruption
MVCC Based (Versioning)
Javascript
Rest Interface
Replication
Map Reduce
Easy to Get started
No Sharding
Can be slower then other solutions