database sharding the right way: easy, reliable, and open source - highload++ 2012

53
Database Sharding the Right Way: Easy, Reliable, and Open source. Esen Sagynov (@CUBRID), NHN Corporation Service Platform Development Center Monday, October 22, 2012

Upload: cubrid

Post on 12-May-2015

4.606 views

Category:

Technology


5 download

DESCRIPTION

The presentation the CUBRID team presented at Russian HighLoad++ Conference in October, 2012. The presentation covers the topic of Big Data management through Database Sharding. CUBRID open source RDBMS provides native support for Sharding with load balancing, connection pooling, and auto fail-over features.

TRANSCRIPT

Page 1: Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++ 2012

Database Sharding the Right Way: Easy, Reliable, and Open

source.

Esen Sagynov (@CUBRID),NHN CorporationService Platform Development Center

Monday, October 22, 2012

Page 3: Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++ 2012

Growing in the Wild. The story by CUBRID Database

Developers.

Esen Sagynov (@CUBRID),NHN CorporationService Platform Development Center

Monday, April 2, 2012

Eugen Stoianovici,NHN CorporationCUBRID Development Lab

View on Slideshare

http://profyclub.ru/docs/439

Page 4: Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++ 2012

We talked about…

• Who is NHN• Reasons behind CUBRID

development• What CUBRID has to offer. Benefits &

advantages.• What we have learnt so far. Where

we are heading to.

Page 5: Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++ 2012

CUBRID Facts RDBMS True Open Source @ www.cubrid.org Optimized for Web services High performance Large DB support High-Availability feature DB Sharding support 90+% MySQL compatible SQL syntax + Oracle

statistics functions ACID Transactions Online Backup Supported by NHN Corporation

Page 6: Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++ 2012

Q&A

• Используем ли мы внешние библиотеки, как STL или Boost, в CUBRID?

• Каким образом CUBRID будет выполнять сложные SQL запросы, поддерживать транзакции и ACID во множественных серверах в условиях горизонтального масштабирования?

Page 7: Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++ 2012

=Big Business Opportunity

Page 8: Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++ 2012

How to manage Big Data?

SQL

NoSQL- Open Source- Scalable- Non-standard API

- Enterprise- Vendor dependency- Scalability con-

straints- Common interface

Page 9: Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++ 2012

Big Data =? NoSQL• Uses RDBMS with Sharding• Data is stored as simple Key-Value.

• Uses RDBMS with Sharding• Sharding and Replication is abstracted through Gizzard

• Uses RDBMS with Sharding• Hbase usage is limited

• Uses RDBMS to store data• Data caching in a variety of ways

• Uses RDBMS with Sharding• ACID is the reason to use RDBMS

• Uses RDBMS with Sharding• Easier to implement, best suits their needs

• Uses RDBMS with Sharding and HA• Data consistency and relationship are the reason

Page 10: Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++ 2012

What NoSQL lacks?

SQL

Transactions

NoSQL => NoACID

Standard Interface

Experts

Page 11: Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++ 2012

Oracle DB Market Share

2009 2010 201140%45%50%55%60%65%70%

KoreaWorldwide

DBMSMarket$MM

Worldwide 21,359 23,252 26,701 11.8%

Korea 349 395 478 17%

Ratio

1.6% 1.7% 1.8%

Source: Gartner, 2012

Page 12: Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++ 2012

RDBMS is still the best choice formission-critical data

Page 13: Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++ 2012

Database Sharding

Page 14: Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++ 2012

DB

“User” Ta-ble

id

name

age

1 Jackie

10

2 Bruce

12

3 Chuck

13

4 Billy 14

Shard #1

“User” Ta-ble

id

name

age

1 Jackie

10

2 Bruce

12

DB Sharding

Shard #2

“User” Ta-ble

id

name

age

3 Chuck

13

4 Billy 14

Database Sharding

Page 15: Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++ 2012

Sharding SolutionsName Type Requirements Inter-

faceDB ETC

Hibernate shards AS framework

DBMS w/Hibernatesupport

- Hiber-nate

- JVMJava

dbShards AS & Middle-ware MySQL Java, C

Gizzard (Twitter) Middleware Any stor-age

- JVM Java

Spider for MySQL

Middleware &Storage Engine MySQL Any

CUBRID SHARD Middleware

- CUBRID

- MySQL- Oracle

Any

Page 16: Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++ 2012

Disadvantages of Existing Solu-tions

• Third-party• Separate installation/hardware• Painful configuration and management• Additional dependency– RDBMS upgrade?– Sharding solution upgrade?

• Generating Unique ID?• Auto-rebalancing?• HA?

Page 17: Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++ 2012

The Ideal Solution

• A single open source RDBMS which provides native support for scalability:– database sharding– connection pooling– load balancing– data auto-rebalancing– high-availability

Page 18: Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++ 2012

Is there such RDBMS?

Page 19: Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++ 2012

CUBRID 9.0

Page 20: Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++ 2012

Scalability with CUBRID

CUBRID 8.4.1connection poolingload balancinghigh-availability

CUBRID 9.0 (code name Apricot)database sharding

CUBRID next (code name Banana)data auto-rebalancing

Page 21: Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++ 2012

Universal Sharding

• Supports:

coming soon…

Page 22: Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++ 2012

Native DB Sharding

JDBC

User Apps

CCI

User Apps

CUBRID SHARD middleware

shard #0 shard #N……

CUBRID

Page 23: Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++ 2012

Easy Installation

Page 25: Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++ 2012

CUBRID SHARD Configura-tion

• shard.conf– Default configuration file for CUBRID

SHARD.

• shard_connection.txt– Predefined list of shard IDs, database

and host names for CUBRID/MySQL.

• shard_keys.txt– A list of shard_key_columns and their

mapping with shard_id

Page 26: Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++ 2012

shard.conf

SHARD_KEY_MODULAR = 256SHARD_KEY_LIBRARY_NAME = ‘’SHARD_KEY_FUNCTION_NAME = ‘’

Page 27: Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++ 2012

shard_connection.txt

Page 28: Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++ 2012

shard_keys.txt

Shard Key Column name id user_id order_no …

=

Page 29: Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++ 2012

Custom Libraryint user_get_shard_key(int type, void *val){ int mod = 2;

if (val == NULL) {

return ERROR_ON_ARGUMENT; }

switch(type) {

case SHARD_U_TYPE_INT:{ int ival; ival = (int) (*(int *)val); return ival % 2;} break;case SHARD_U_TYPE_STRING: return ERROR_ON_MAKE_SHARD_KEY;default: return ERROR_ON_ARGUMENT;

} return ERROR_ON_MAKE_SHARD_KEY;}

Page 30: Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++ 2012

Configuring CUBRID SHARD is very easy!

Page 31: Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++ 2012

#1) Create Shards• Host 1..N:

$> cubrid createdb shard1$> csql -S -u dba shard1 -c "create user shard password 'shard123’”$> cubrid server start shard1

Page 32: Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++ 2012

#2) Create same tables

$> csql -C -u shard -p 'shard123' shard1@localhost -c ”CREATE TABLE users (id BIGINT PRIMARY KEY, name VARCHAR(20), age SMALLINT)”

• Host 1..N:

Page 33: Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++ 2012

#3) Start CUBRID SHARD

$> cubrid shard start@ cubrid shard start ++cubrid shard start: success

Page 34: Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++ 2012

Standard Connection URL

connectionURL ="jdbc:cubrid:localhost:45511:shard1:shard:shard123:";

DB name

password

Shard broker port

username

Page 35: Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++ 2012

String query = "SELECT name FROM student WHERE student_no = /*+ shard_key */ ?; ";PrepareStatement query_stmt = connection.prepareStatement(query);query_stmt.setInt(1,100);ResultSet rs = query_stmt.executeQuery();// fetch resultset

2. Query analysis3. shard_key hashing4. Passing the query to the selected shard.

CUBRID

SHARD

shard #0 shard #1 shard #2 shard #4

key_columnrange

(hash result) shard_idmin max

student_no 0 63 0

student_no 64 127 1

student_no 128 191 2

student_no 192 255 3

Shard selection

1. Execute queryClient app

Page 36: Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++ 2012

Querying Shards

SELECT name FROM student WHERE student_no = /*+ shard_key */ ?;

SQL hint

Shard key column

• bind variable• fixed value

Page 37: Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++ 2012

Types of SQL Hints

SQL Hints Description

/*+ shard_key */ a hint to specify the location of - a bind variable - or the literal valuewhich corresponds to the shard key column

/*+ shard_val(value) */

a hint to explicitly specify the shard key in case thecolumn that corresponds to the shard key does notexist in a query

/*+ shard_id(shard_id) */

A hints which can be used to directly processuser queries on a particular shard

Page 38: Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++ 2012

How did we tackle theunique ID problem?

Page 39: Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++ 2012

Generating Unique IDs

• DB Ticket Server– SERIAL (max = 1037 for ascending serial)– Sortable– 64 bits (BIGINT)– SERIAL CACHE (ON/OFF) for improved performance– Auto fail-over through HA

Page 40: Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++ 2012

CUBRID SHARD in HA

CUBRID SHARD middle-ware

shard #0

Client app

shard #1 shard #2 shard #3

Master

Slave

shard #0 shard #1 shard #2 shard #3

= =sync / async / semi-sync + auto fail-over

…. ….Client app Client app

Page 41: Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++ 2012

CUBRID SHARD Performance

Page 42: Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++ 2012

nGrinderCon-

troller

nGrinderAgent

nGrinderAgent

nGrinderAgent

nDrive App

Simulator(DBGW

lib)

nDrive App

Simulator(DBGW

lib)

nDrive App

Simulator(DBGW

lib)

http

ShardProxy

Broker

Meta DB 1

Meta DB 2

Meta DB 3

Meta DB 4

User DB

CAS- CUBCCI - CAS

8 agents and simu-

lators

Description Quantity OS (64bit) / CPU / MEM

Agent to generatload andNDrive App Simula-tor

8 Centos5.3 / xeon 2G-8core / 8G

CUBRID Shard 1 Centos5.3 / xeon 2.27G-16core / 24G

CUBRID Broker 1 Centos5.3 / xeon 2.27G-16core / 24G

Meta DB 4 Centos5.x / xeon 2.33G-4core / 8G

User DB 1 Centos5.3 / xeon 2.5G-8core / 8G

Meta DB

User DB

Broker

Shard Proxy

nGrinderAgent

nDrive App

Simulator

Performance Test ScenarioTest environment and hardware configurations

Page 43: Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++ 2012

Test scenario #1: Maximum performance test

- Check if it can handle the load of at least 60,000 RPS

RPS = Request Per Second (Throughput)Vuser = # of concurrent users 32 64 96 128 160 192 256 320 384 448 512

020000400006000080000

100000

Load Generator Per-formance

# of concurrent users

RP

S

Increase in Vuser leads to in-crease:- of response time- of CPU Load on Proxy

hardware

After 320 Vuser, RPS de-creases- Proxy server load

(CPU load of the Simulator hard-ware ismaintained under 40%)

64 128 192 256 320010203040506070

0

10000

20000

30000

40000

50000

60000

Performance trend when load is increased

proxy cpu Mean Time(ms)RPS metadb TPS

Max performance conditions- 256 Vuser- 13981.98 RPS- 4 Proxy processes => 200 CAS(If # of Proxy processes is in-creased, performance can be in-creased)

Test Results (1)

Page 44: Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++ 2012

Test scenario #2: Performance results compar-ison when: - SHARD is used - is NOT used

- When SHARD is not used, Broker is used.

Test Results (2)

64 128 192 256 3200

10

20

30

40

50

60

70

80

90

0

2000

4000

6000

8000

10000

12000

14000

16000

SHARD vs. Broker Performance Comparison

Broker - broker cpuShard - proxy CPUBroker - Mean Time(ms)Shard - Mean Time(ms)Broker - RPSShard - RPS

Vuser

RP

S

- Similar performance until 128 Vuser- When SHARD is not used, 128 Vuser

is maximum- In SHARD usage case, when # of Vuser is

increase- maximum performance can be

achieved as well as shorter re-sponse time andlower CPU utilization.

Page 45: Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++ 2012

TPC-C Performance Test

Page 46: Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++ 2012

Test Scenario

• 9 tables– customer (300000 records)– district (100 records)– history (300000 records)– item (100000 records)– new_order (0 records)– order_line (3000000 records)– orders (300000 records)– stock (1000000 records)– warehouse (10 records)

• The total number of records > 5 million.

• AWS Xlarge in-stance• 7GB RAM• 20 EC2 units

• Ubuntu 12.04 64-bit

• CUBRID 9.0 (beta) –no shrading

• MySQL 5.5.28

• Buffer• 2.8GB

data_buffer_size• 2.8GB

innodb_pool_size

• Default configura-tions

Page 47: Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++ 2012

Test Results

TPC-C Index30

34

38

42

46

42.66

44.18

MySQL 5.5.28CUBRID 9.0

Page 48: Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++ 2012

CUBRID SHARD Features

Single database view No application change Unlimited DB shards Multiple Sharding

Strategies Parameterized queries Shard targeted query

(SQL Hints) Shared Query Plan

Caching

Connection and statement pooling

Load balancing Read only Sharding Broker

1+ Sharding Brokers/Proxies

Native HA support Master-Slave DB

Generic (non-sharded) Tables

Supports CUBRID and MySQL

Page 49: Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++ 2012

CUBRID SHARD

• Easy– No configuration hassle– No “moving parts”

• Reliable– Auto fail-over– High performance

• Open source– Supported by NHN

Page 50: Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++ 2012

What’s next for CUBRID?

Page 51: Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++ 2012

CUBRID vNext (Banana)

Auto-rebalancing in CUBRID SHARDPerformance+++SQL Compatibility+++SQL Monitoring++

Page 52: Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++ 2012

CUBRID is Big now.

What can you do?

1. Keep watching it2. Consider using3. Discuss, talk, write about CUBRID4. Support CUBRID in your apps5. Contribute to CUBRID6. Provide CUBRID service

Page 53: Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++ 2012

Esen SagynovCUBRID Project Manager

[email protected]

www.cubrid.org

www.facebook.com/cubrid www.twitter.com/cubrid

CUBRID Q&A www.cubrid.org/questions