databases and the cloud
DESCRIPTION
TRANSCRIPT
2012-02-27 TTKK 1
Databases and the cloudHenrik Ingo
TTKK2012-02-27
Please share and re-use this presentation, licensed under Creative Commons Attribution license.
2011-10-25 2
2011-10-25 3
History
...things that used to be difficult at different points in time.
2012-02-27 TTKK 4
1994
2012-02-27 TTKK 5
1994
USSR fell 3 years ago
Finland yet to win first gold medal in hockey
Windows 3.0 replacing MS-DOS
Windows 95 did not yet exist
I learn how to write .bat scripts from my dad
Word 6.0 replacing WordPerfect 5.1
Excel 5.0 replacing Lotus
I had not yet used Internet
2012-02-27 TTKK 6
1994
C++ & Visual Basic
LANPrinter Database
Client - Server architecture
SQLStandard Interpreted
Flexible and expressive command line env
"SQL for secretaries"Good for English speakers
2000: Bad for IDE w IntelliSense
Stored procedures rule
DBA is king of business logic
2012-02-27 TTKK 7
1997
2012-02-27 TTKK 8
1997
2012-02-27 TTKK 9
Learning SQL
NoSQL advocates say:
SQL is hard to learn
To really scale, you must de-normalize
Most people don't get normalization right
Impedance mismatch between oo and sql
I just need a simple key-value store
Henrik says:
MS Access easy to learn
Darn, I always got normalization
I know, teaching n:n relations was always fun
INSERT INTO t ... serialize($obj)
SELECT value FROM tWHERE key=?;
2012-02-27 TTKK 10
2000
MS Access not supported in Linux
PostgreSQL and MySQL
Learn SQL for real
phpMyAdmin to ease the pain
2012-02-27 TTKK 11
2005
Graduate & do websites for a living.
Spend 3-6 days creating and re-creating properly normalized DB schema
In MS Access I just clicked next, next, next and ok.
Time is money
Invent NoSQL:PHP serialize()MySQL BLOB
2012-02-27 TTKK 12
2008
Join MySQL AB
Confused about synchronous vs asynchronous replication
Learn a lot about MySQL NDB Cluster
2011-10-25 13
High Availability
...and why it is more difficultfor databases.
2011-10-25 14
PerformanceTransactions / second (throughput)
Response time (latency)Percentiles (95% - 99%)
DurabilitySpeaking of databasesCommitted data is not lostD in ACID
High Availability
Get any response at all (tps > 0)Measured as percentile (99.999%)
Replicas, snapshotspoint in time, backups
ClusteringMonitoring
Failover
ReplicationRedundancy
2011-10-25 15
Uptime
Percentile target Max downtime per year
90% 36 days
99% 3.65 days
99.5% 1.83 days
99.9% 8.76 hours
99.99% 52.56 minutes
99.999% 5.26 minutes
99.9999% 31.5 seconds
Beyond system availability: Average downtime per user.
2011-10-25 16
Clustering frameworks - general
Failover
HeartbeatCorosync
VM of choiceRed Hat ClusterSolaris Cluster
...
2011-10-25 17
Clustering frameworks - DB
Failover
HeartbeatCorosync
MMMVM of choice
MHATungsten Enterprise
Solaris Cluster...
2011-10-25 18
Sounds simple. What could possibly go wrong?
Old Master must stop service (VIP, os, DB). But it is not responding, so how do you make it stop?
Polling from the outside. Interval = 1 sec, 10 sec, 60 sec!
What if replication fails first and client transactions don't?
Polling connectivity of DB nodes but not client p.o.v.
Failover can be expensive (SAN, DRBD) -> false positives costly
2011-10-25 19
Disk Disk
Active-Active Shared disk clustering. (State of the art?)
Failover
Oracle RAC(ScaleDB?)
2011-10-25 20
Sounds simple. What could possibly go wrong?
Well, actually it's pretty good.Data integrity protection is good.But...
SAN is considered the biggest SPOF of all.
Recovery time on single node failure is +60 sec
Recovery time? Because internally each node will lock some pages and process them locally.
(Bloody expensive)
Disk
2011-10-25 21
Synchronous multi-master
Failover
NDBGalera
2011-10-25 22
Sounds simple. What could possibly go wrong?
Synchronous Multi Master replication rocks :-)
Failure detection inherent in replication protocol.
Instant failovers.
Bonus: Both Galera and NDB provision new nodes automatically.
Problem is solved. Time for new problems...
2011-10-25 23
Performance
SAN has "some" overhead compared to local disk
DRBD = 50% performance penalty
Replication, when implemented correctly, has 0 performance penalty.
Galera and NDB = more performance
2011-10-25 24
Is a clustering solution part of the solution or the part of the problem?
"Causes of Downtime in Production MySQL Servers" by Baron Schwartz:
#1: Human error #2: SAN
Complex clustering framework + SAN =
More problems, not less!
Galera (and NDB) =
Replication based, no SAN or DRBD
No "failover moment", no false positives
No clustering framework needed
No load balancer needed (JDBC loadbalance)
Simple and elegant!
2011-10-25 25
Scale-out and elasticity
...and why it is more difficultfor databases.
2012-02-27 TTKK 26
Scale-out
Invented by MySQL / LAMP stack.
Laughed at by other RDBMSes
Now everyone does it. Because the Internet is too big.
Originally with read-only replicas. Then sharding.
Easy for http, inconvenient for databases.
NoSQL systems do it really well.
MySQL NDB does it really well and Galera pretty well.
2012-02-27 TTKK 27
DBA's life is more interesting!
HTTP
Stateless
Usually can partition/shard
Scale-out = boot more servers
Writes = Write to the database
RDBMS
Where everyone else stores their state
Needs expertise to partition/shard
Scale-out = Boot more servers. Backup DB. Restore DB. Setup replication. Tweak application code...
Write = Which partition/node? Beware of read-only slaves. Beware of eventual consistency...
2012-02-27 TTKK 28
But it can be done
Automating DB deployments is more complex. But not impossible.
NDB and Galera handle data provisioning really well. But deploying the empty nodes still manual labor.
Scale-out happens because you must. Scale-down will never happen if it's too much work.
MySQL
Amazon RDS, others.
Severalnines
Also supports Galera and NDB
Scalr
PostgreSQL
Heroku
EnterpriseDB
NoSQL
Usually do this relatively well.
2011-10-25 29
NoSQL
...and why it is more difficultfor relational databases.
2012-02-27 TTKK 30
It used to be
2012-02-27 TTKK 31
The future is
2012-02-27 TTKK 32
The future is
All Open SourceAll Open Source
2012-02-27 TTKK 33
Things NoSQL guys do really well
No SQL (parsing)
Schemaless = Win! for agile development
HA with quorum consistency: R + W > N
Transparent sharding
Graph databases
N:N relationships, what's the big deal?
Actually makes sense to give up on SQL!
Map Reduce
Bypass ETL, get clean data
Implemented in Java or Python (or Erlang)
Reading tip: Original Amazon Dynamo
paper
Reading tip: Original Amazon Dynamo
paper
2012-02-27 TTKK 34
Best of both worlds
NoSQLNo SQL.
Simple key-value store.
...and secondary indexes
Quorum consistency.
Transparent sharding
Graph databases
Map Reduce against text files
Java and Python.
MySQLHandlerSocketMemcache API, NDB API
BLOBSELECT v FROM ... WHERE k=?
Functional indexesVirtual columns, etc...
Synchronous replicationGalera, NDB
We have it tooNDB, Spider + proprietary solutions
Damn N:N relations!
Map Reduce against RDBMS
C++ can be done rightDrizzle
2011-10-25 35
Cloud
...and why it is more difficultfor databases.
2011-10-25 36
4 different DB deployments
Server HW
VM
DB process
User (schema)
2011-10-25 37
Consider memory utilization"All of computation is just different layers of caching."
Dedicated HWGreat performance€€€€Not cloud
Virtualization overhead
"Safety margin" of unallocated memory per VM.
Memory allocation per VM is fixed (without reboot) -> cache of idle DB instances is wasted.
No virtualization overhead
Memory allocation per DB instance is fixed (without restart) -> cache of idle DB instances is wasted.
No virtualization overhead.
Busy schema can use more cache and idle schema evicted from cache.
2011-10-25 38
Multi-tenancy
Web hosting w MySQL
Cpanel = hack
Not "cloud". Users expect "dedicated" database instance.
Drizzle
True multi-tenancy, "virtualization for databases"
Ready but experimental
2012-02-27 TTKK 39
Thank you