how to scale relational (oltp) databases. think: sharding @c16lv

Session ID:

Prepared by:

How to scale relational databases ?

Think: Sharding

4829

Maxym KharchenkoGluent.com

Oltp

Whoami

• Started as a database kernel developer

• Then: ORACLE DBA for 15+ years

• Now: Developer at Gluent (past: amazon.com)

• OCM, ORACLE Ace Associate, AWS Developer

• Blog: intermediatesql.com• Twitter: @maxymkh

The cool things that we do at Gluent

3

GluentOracle

TeradataNoSQL

Big Data Sources

MSSQL

App X

App Y

App Z

We glue these worlds together!

The cool things that we do at Gluent

Relational databases are best for OLTP, because they are ACID

Unfortunately, relational databases cannot scale

Conventional wisdom

Riding Moore’s Law(a.k.a: “traditional” database scaling)

2013 2014 2015 2016 2017

HW overrun here !

You arehere !

Traditional database scaling

Old System

New System

2013 2014 2015 2016 2017

ReplaceHW

here !

But now: Data grows faster!

Old System

New System

2013 2014 2015 2016 2017

Moore’s Law:The future ain’t what it used to be!

Old System

New System

2013 2014 2015 2016 2017

Let’s move the database to a bigger box

Scaling up …

Scaled!

One machine is just not enough anymore

Use more machines

So, why don’t we use RAC ?

Clusters are hard

Clusters are hard

And expensive

- Need top-of-the-line HW

- And super smart engineers

- And additional license $$$$$$$$$

Solution: Shared nothing architecturea.k.a. “Sharded”

Sharded architecture in a nutshell

=

$$$$$ $$

Split your data into smallindependent chunks

Run each chunkon cheap commodity hardware

The Basics

Sharding is, basically, partitioning

Except, each “partition” is a database

Practical table design for sharding

Let’s “shard” a simple table

CREATE TABLE books ( id number PRIMARY KEY, title varchar2(200), author varchar2(200));

CREATE TABLE books ( id number PRIMARY KEY, title varchar2(200), author varchar2(200)

) SHARD BY <method> (<shard_key>) ( SPLIT SIZE evenly SPLIT LOAD evenly PREFER SINGLE SHARD ACCESS DISCOURAGE DATA MOVE USING <N> DATABASES);

Let’s “shard” a simple tableNot a “real”

ORACLE command(yet)

Hey, let’s shard it by “name” range

SHARD BY LIST (first_letter(author)) ( … SPLIT SIZE evenly);

A-G H-M N-T

U-Z

Hey, let’s shard it by “id” range

SHARD BY RANGE (id) ( … SPLIT LOAD evenly);

1-100 101-200 201-300 301-400

Hashes are your friend

SHARD BY HASH (id) ( SPLIT SIZE evenly SPLIT LOAD evenly);

But (especially for OLTP)be sure to chose the right hash column

SHARD BY HASH (id) ( PREFER SINGLE SHARD ACCESS);

SELECT title FROM booksWHERE id = 34567876;


SHARD BY HASH (id) ( PREFER SINGLE SHARD ACCESS);

SELECT title FROM booksWHERE author = 'Isaac Asimov'ORDER BY title;

SHARD BY HASH (author) ( PREFER SINGLE SHARD ACCESS);

0 1 2 3

SELECT title FROM booksWHERE author = 'Isaac Asimov'ORDER BY title;


Think about eventual re-sharding

SHARD BY hash(author) ( DISCOURAGE DATA MOVE USING 4 DATABASES);

0 1 2 3

Think about eventual re-sharding

SHARD BY mod(hash(author), 4) ( DISCOURAGE DATA MOVE);

0 1 2 3

Discourage data move

SHARD BY mod(hash_function(author), 6)( DISCOURAGE DATA MOVE);

0 1 2 3

4 5

Major resharding is a PITA

Hash Mod/41 12 23 34 05 16 27 38 09 1

10 211 312 0

Hash Mod/4 Mod/61 1 12 2 23 3 34 0 45 1 56 2 07 3 18 0 29 1 3

10 2 411 3 512 0 0

Solution: Logical shards


DB 1 DB 2 DB 3 DB 4

Solution: Logical shards


DB 1 DB 2 DB 3 DB 4

DB 5

Accessing your sharded data

Which database ?SELECT title FROM booksWHERE author = 'Isaac Asimov'

Which database ?SELECT title FROM booksWHERE author = 'Isaac Asimov'

Hash(author)

Lookup (hash)

Executing the query

def shard_query(sql, binds, shard_key): """ Execute query in the correct db """

shard_hash = hash(shard_key) logical_bucket = mod(shard_hash, TOTAL_BUCKETS) physical_db = memcached_get_db(logical_bucket) execute_query(physical_db, sql, binds)

Sharding your ORACLE data

Standbys

Unsharded StandbyShard 1 Shard 2

Apps

Read Only

Drop non-qualifying data Drop non-qualifying data

MViews

Shard1

Apps

TabA

Shard 2

MVA

TabA

Create materialized view … as select …from a@shard1

Dropmaterialized view … preserve table

Read Only

Moving “data head”

Shard 1

Apps

Shard 2

Logical Shard

Physical Shard

(1,2,3,4) 1(5,6,7,8) 2

Time Logical Shard

Physical Shard

2015(1,2,3,4) 12015(5,6,7,8) 2

Shard 1

Apps

Shard 2


Time Logical Shard

Physical Shard

2015(1,2,3,4) 12015(5,6,7,8) 22016(1,2) 12016(3,4) 32016(5,6) 22016(7,8) 4

Shard 2

Apps

Shard 3 Shard 4Shard 1


So, why shard ? Money!

Why shards are awesome

• (potentially) Unlimited scaling– 100s or 1000s of shards “in range”

• Once routed in, “it’s pure ORACLE”:– Transactions, ACID, foreign keys etc

• Better maintenance:– Smaller data, smaller load

• Eggs not in one basket:

– Even if a shard is down, “most of the system” is still up

• “Apples to apples comparison” with other shards

Why shards are NOT so great

• More systems– Power, rack space etc– Needs automation … bad– More likely to fail overall

• Some operations become difficult:– Transactions across shards– Foreign keys across shards

• More work:– Applications, developers, DBAs– High skill, DIY everything

Thank you!

Please, evaluate my session: 4829

[email protected]: @maxymkh

Extras

Data to be “sharded” has to be simple

Your data has to be “simple”

Think this Not that

Not simple

Categorize your data

Then, take it apart

This is also known as splitting “by Nouns” or “by Verbs”

Your data is ready for sharding

When it lookslike this

Or like this(at the most)

how to scale relational (oltp) databases. think: sharding @c16lv

Technology