how mysql and nosql coexist - percona · how mysql and nosql coexist matt yonkovit - percona. ...

47
How MySQL and NoSQL Coexist Matt Yonkovit - Percona

Upload: dinhliem

Post on 04-Apr-2018

245 views

Category:

Documents


1 download

TRANSCRIPT

How MySQL and NoSQL Coexist

How MySQL and NoSQL Coexist

Matt Yonkovit - Percona

www.percona.com

DIAMOND SPONSORSHIPS

THANK YOU TO OUR

DIAMOND SPONSORS

Who Am I

Matt Yonkovit - Principal Consultant Percona➔ MySQL->Sun Veteran➔15+ years Database Experience➔Lots of fun at parties

What We do

Where MySQL and the Ecosystem is today:

Where MySQL and the Ecosystem is today:

Lots of Data

MySQL focus: scale upLook at the advances made within innodb

•Scalability to 32+cores

•SSD enhancements

•Dealing with mega-sized buffers pools

This fits with hardware•Multi-core

•Lots of memory standard

•Cheap Flash

We Need More Servers!

Industry Says Scale Out

Cloud Computing

•Rise of EC2

•Scale on demand

•Pay for what you need

Big companies run 1000's of servers

Big Data retention drives more scale out

•Retention and Analysis out pace hardware growth

This is part of the cyclical nature of computing, I remember hearing this from Oracle when RAC was first introduced, before that it was time share

Developers want easy and fast

SQL Is Powerful

But SQL can be overly complex for many operations

Sometimes Optimizers in RDBM's do stupid things and developers can do it better

SQL is yet another component a developer needs to learn

Even for Simple SQL operations there is overhead to parse/execute

ORM Sucks

Because developers want to be focused on being super developer ninja's … the often turn to ORM's

ORM's ( like active record ) work great for simple applications, but tend to bork when you have complex mappings

Fast Moving Changes

What data and how that data is stored is changing at a break neck pace

Changes to large databases are hard•Example In a presentation Craigslist said their archive db took 1 monthto alter a table.

•Other alters can still take days or weeks

Missed opportunities

While the MySQL community as a whole has done an awesome job, we did miss a few things:

● Online Table Alters ( Add/Mod column )● Scale out -vs- Scale up● Flexible data types

How other groups tackled the problems:

How other groups tackled the problems:

The Rise of NoSQL

RDBM's of old did not keep up with the demand

Need for fast, efficient data access

Eliminate the pain points

Eliminate the unneeded fluff•Many websites got along well without things like Functions, Stored porcs, full acid, compliance, etc.

NoSQL Covers a lot

Key/Value•i.e. memcached, redis

Column Stores•i.e. Hbase

Document Stores•i.e. Mongo,Couch

More “developer” centric interface

Instead of relying on a SQL interface many NoSQL solutions allow develoeprs to stay in code and directly access objects using their programing language of choice

•i.e. mongodb's javascript interface

Allow data to be retrieved in an easily consumable format

•i.e. json, binary object, etc

Scale outRemove the complexities and automate sharding

Make full use of multiple servers for complex tasks

•i.e. map reduce

Ability to add servers on demand

Support for fail-over and replication

Easy to change

Change is a certainty in life... Users will demand it!

Make sure that you are not bound to a rigid structure

•Allow for changes on the fly “flexible schema”

Custom Features

Add support for features missing in MySQL or that solve a specific need

•GridFS (Mongo)

•Super Columns (Cassandra)

•Lists/Sets (Redis)

Where are we now?

SQL -vs- NOSQL?

Where are we now?SQL -vs- NOSQL?

Too Much FUDA lot of developers like to hate on MySQL

A NoSQL Solution (Like Mongo) does not mean:

•You do not have to think about your data types and “schema” design

•Does not mean add nodes will solve all your issues

•Does not mean you can get sloppy with code

•Will not fully replace a relational db

Bad Code is Bad Code

As complex as you make it

You still have to think with NoSQL•How will this data be used

•Will we have to correlate this data with other data in the system

•Will duplication of data cause issues? Bloat space?

Too Much FUDA lot of people in MySQL hate on NoSQL

Not All “NOSQL” solutions are made the same

•Some of them are durable, or are adding durability features ( i.e. mongo in 1.8+ )

•While in most “NOSQL” more memory = better performance, the same holds true for MySQL

•Not every solution is eventual consistency

Remember MySQL started as “that other technology”, mocked by classic DBA's

Many of these solutions are at similar places where MySQL was 6-7 years ago.

MySQL is not Sitting Idle

MySQL is not Sitting Idle

Lessons from NDB

In Many ways MySQL has had a “NoSQL” solution available for years

NDB ( Mysql Cluster ) is at its code a “NoSQL” solution with a SQL wrapper

Many deployments of NDB need to bypass the SQL layer and write directly to the NDB API in order to achieve optimal performance

Parsing SQL Can be slowlibmysql (Akira’s Numbers)

samples % symbol name

748022 7.7355 MYSQLParse(void*)

219702 2.2720 my_pthread_fastmutex_lock

205606 2.1262 make_join_statistics(…)

198234 2.0500 btr_search_guess_on_hash

Sources:http://www.mysqlperformanceblog.com/2011/03/16/where-does-handlersocket-really-save-you-time/http://www.slideshare.net/akirahiguchi/handlersocket-20100629en-5698215

Handler Socket

Developed by DeNA

Direct Access to the Innodb storage engine, bypassing SQL

Key-Value type access

Yoshinori hit 750K QPS, faster then memcached, and 7X faster then Stock SQL

http://yoshinorimatsunobu.blogspot.com/2010/10/using-mysql-as-nosql-story-for.html

Handler Socket Performance

Benchmark by Vadim:http://www.mysqlperformanceblog.com/2010/11/02/handlersocket-on-ssd/

Memcached

Oracle recently got in on the fun, introducing a Memcached interface for Innodb

Key-Value access

Uses libmemcached ( known and used protocol)

Promise the ability to use the distributed hash capabilities of memcached to shard ( not available yet )

Answer:

What's important to you?

Question:Should you use MySQL or Look at NoSQL Solution?

Answer:It depends on

your application.

Questions you should ask

● Do you need transactions?● Can you risk data loss?● What are your performance requirements?● What level of risk is acceptable?● Costs involved?

● Hard costs: Servers, Infrastructure● Soft costs: Developer Time

General MySQL Considerations

●Durability important ( Is some data loss acceptable? )●3rd Party applications●Integration with other RDBMS's●Already invested in SQL?●OLTP type Workloads?●Transactions●Joins

General NoSQL Considerations

SQL Is Overkill

•Simple Key Values?

Document Oriented

•No Standard Form or Consistent data stream

Require CPU Based resources from several machines

Huge amounts of archived data

Rapid Changes to the structures

Possible data inconsistencies

•Eventual consistency by product

Performance?

Performance for both MySQL and NoSQL solutions can vary wildly.

Impacted by:

•Schema/Object design

•Datatypes

•Indexes

•ORM

•Drivers

You can in many cases you can trade performance for reliability/consistency

Space

With Huge Datasets, space can be a premium

Some NoSQL options can take up a lot more space then their MySQL counter parts.

XML Example (no redundant data):•2.5GB in MongoDB

•486MB in Innodb

Some solutions have you duplicate data for performance and simplicity

Thou shall Benchmark

Thou shall BenchmarkTrust, but verify

Verify

I have run into a lot of people who leap before they test

•Asking for trouble

Don't be trapped by legacy problems•If you have bad sql code, its not MySQL's fault ;)

Use specific benchmarks, not generic ones

NoSQL Options

NoSQL Options

MongoDB

Nice Json centric data storage

•Do not underestimate

MapReduce capabilities

Better durability in version 1.8+

Built in sharding

Replication

No Native Joins

Trade performance for consistency

Data footprint can be bigger then Mysql in some cases

Redis

Pros:

Fast when fully in memory, but can use “virtual memory”

Supports “clustering”

Supports replication

Super fast

Complex data types like lists

Cons:

Using Virtual Memory prevents using other features

Data loss possible

No Native joins

Key Access only

Cassandra

Pros:

Auto-Sharding of data

Replication

Parallel Processing

Super Columns are interesting

Cons:

Eventual Consistency

Documentation not as deep as other solutions

Speed is by product of adding more nodes

Network Chatty

Durability issues

Hadoop

Pros:

Great with Super Large Datasets (Petabytes)

Extreme Parallelism

Can do complex CPU intensive tasks via map reduce

Cons:

Can be complex

Needs time to produce results

No Native Joins/Indexes

Tokyo Cabinet

Pros:

Can Be Super Fast

Multiple Table Types to support different requirements

Embeddable

Replication

Can use Memcached protocol

Cons:

No Built in Sharding

Durability/Consistency issues

Lack of Documentation

Can Bog down if you run out of memory

Couch

Resistant to corruption

MVCC Based (Versioning)

Javascript

Rest Interface

Replication

Map Reduce

Easy to Get started

No Sharding

Can be slower then other solutions

MySQL Tricks

There are ways to solve the “NoSQL” issues in MySQL, but they are manual or add complexity

•Storage engines

•Shard Query

•XML Data

•Sharding

•NDB/Cluster