how to scale relational (oltp) databases. think: sharding @c16lv
TRANSCRIPT
Session ID:
Prepared by:
How to scale relational databases ?
Think: Sharding
4829
Maxym KharchenkoGluent.com
Oltp
Whoami
• Started as a database kernel developer
• Then: ORACLE DBA for 15+ years
• Now: Developer at Gluent (past: amazon.com)
• OCM, ORACLE Ace Associate, AWS Developer
• Blog: intermediatesql.com• Twitter: @maxymkh
The cool things that we do at Gluent
3
GluentOracle
TeradataNoSQL
Big Data Sources
MSSQL
App X
App Y
App Z
We glue these worlds together!
Relational databases are best for OLTP, because they are ACID
Unfortunately, relational databases cannot scale
Conventional wisdom
Riding Moore’s Law(a.k.a: “traditional” database scaling)
2013 2014 2015 2016 2017
HW overrun here !
You arehere !
Clusters are hard
And expensive
- Need top-of-the-line HW
- And super smart engineers
- And additional license $$$$$$$$$
Let’s “shard” a simple table
CREATE TABLE books ( id number PRIMARY KEY, title varchar2(200), author varchar2(200));
CREATE TABLE books ( id number PRIMARY KEY, title varchar2(200), author varchar2(200)
) SHARD BY <method> (<shard_key>) ( SPLIT SIZE evenly SPLIT LOAD evenly PREFER SINGLE SHARD ACCESS DISCOURAGE DATA MOVE USING <N> DATABASES);
Let’s “shard” a simple tableNot a “real”
ORACLE command(yet)
Hey, let’s shard it by “name” range
SHARD BY LIST (first_letter(author)) ( … SPLIT SIZE evenly);
A-G H-M N-T
U-Z
Hey, let’s shard it by “id” range
SHARD BY RANGE (id) ( … SPLIT LOAD evenly);
1-100 101-200 201-300 301-400
But (especially for OLTP)be sure to chose the right hash column
SHARD BY HASH (id) ( PREFER SINGLE SHARD ACCESS);
SELECT title FROM booksWHERE id = 34567876;
But (especially for OLTP)be sure to chose the right hash column
SHARD BY HASH (id) ( PREFER SINGLE SHARD ACCESS);
SELECT title FROM booksWHERE author = 'Isaac Asimov'ORDER BY title;
SHARD BY HASH (author) ( PREFER SINGLE SHARD ACCESS);
0 1 2 3
SELECT title FROM booksWHERE author = 'Isaac Asimov'ORDER BY title;
But (especially for OLTP)be sure to chose the right hash column
Think about eventual re-sharding
SHARD BY hash(author) ( DISCOURAGE DATA MOVE USING 4 DATABASES);
0 1 2 3
Major resharding is a PITA
Hash Mod/41 12 23 34 05 16 27 38 09 1
10 211 312 0
Hash Mod/4 Mod/61 1 12 2 23 3 34 0 45 1 56 2 07 3 18 0 29 1 3
10 2 411 3 512 0 0
Solution: Logical shards
SHARD BY mod(hash(author), 1200) ( DISCOURAGE DATA MOVE);
DB 1 DB 2 DB 3 DB 4
Solution: Logical shards
SHARD BY mod(hash(author), 1200) ( DISCOURAGE DATA MOVE);
DB 1 DB 2 DB 3 DB 4
DB 5
Executing the query
def shard_query(sql, binds, shard_key): """ Execute query in the correct db """
shard_hash = hash(shard_key) logical_bucket = mod(shard_hash, TOTAL_BUCKETS) physical_db = memcached_get_db(logical_bucket) execute_query(physical_db, sql, binds)
Standbys
Unsharded StandbyShard 1 Shard 2
Apps
Read Only
Drop non-qualifying data Drop non-qualifying data
MViews
Shard1
Apps
TabA
Shard 2
MVA
TabA
Create materialized view … as select …from a@shard1
Dropmaterialized view … preserve table
Read Only
Time Logical Shard
Physical Shard
2015(1,2,3,4) 12015(5,6,7,8) 2
Shard 1
Apps
Shard 2
Moving “data head”
Time Logical Shard
Physical Shard
2015(1,2,3,4) 12015(5,6,7,8) 22016(1,2) 12016(3,4) 32016(5,6) 22016(7,8) 4
Shard 2
Apps
Shard 3 Shard 4Shard 1
Moving “data head”
Why shards are awesome
• (potentially) Unlimited scaling– 100s or 1000s of shards “in range”
• Once routed in, “it’s pure ORACLE”:– Transactions, ACID, foreign keys etc
• Better maintenance:– Smaller data, smaller load
• Eggs not in one basket:
– Even if a shard is down, “most of the system” is still up
• “Apples to apples comparison” with other shards
Why shards are NOT so great
• More systems– Power, rack space etc– Needs automation … bad– More likely to fail overall
• Some operations become difficult:– Transactions across shards– Foreign keys across shards
• More work:– Applications, developers, DBAs– High skill, DIY everything