scaling data on public clouds

48
Scaling Data On Public Clouds Liran Zelkha, Founder [email protected]

Upload: liran-zelkha

Post on 22-May-2015

545 views

Category:

Technology


0 download

DESCRIPTION

Check which alternatives you have to scale your data on the public cloud

TRANSCRIPT

Page 1: Scaling data on public clouds

Scaling Data On Public Clouds

Liran Zelkha, Founder

[email protected]

Page 2: Scaling data on public clouds

About Us

• ScaleBase is a new startup targeting the database-as-a-service market (DBaaS)

• We offer unlimited database scalability and availability using our Database Load Balancer

• We currently run in beta mode – contact me if you want to join

Page 3: Scaling data on public clouds

Problem Of Data

• Flickr just hit 5B pictures

• Facebook > 0.5B users

• Farmville have more monthly players than the population of France

Page 4: Scaling data on public clouds

Mondays Key Note

• More data

• More users

• More complex actions

• Shorter response times

Page 5: Scaling data on public clouds

Scalability Pain

http://media.amazonwebservices.com/pdf/IBMWebinarDeck_Final.pdf

You just lost

customers

Infrastructure

Cost $

time

Large

Capital

Expenditure

Opportunity

Cost

Predicted

Demand

Traditional

Hardware

Actual

Demand

Automated

Virtualization

Page 6: Scaling data on public clouds

CAP vs. ACID

• CAP = Consistency, Availability, Partition Tolerance

• ACID = Atomicity, Consistency, Isolation, Durability

• Atomicity – Chain of actions treated as one whole unseperateable action

• Isolation – Consistent query snapshots, read across writes, 4 levels are supported

Page 7: Scaling data on public clouds

ScaleBase Database Scaling In A Box

• The first truly elastic, fault tolerant SQL based data layer

• Enables linear scaling of any SQL based database

Legacy clients Applications

Scalebase

Database instances

Page 8: Scaling data on public clouds

ScaleBase

Application/Web Servers

Scal

eB

ase

Shared Nothing DB Machines

Commodity Hardware

MySQL? Oracle?

Scalable and hi-available

Page 9: Scaling data on public clouds

THE REQUIREMENTS FOR DATA SLA IN PUBLIC CLOUD ENVIRONMENTS

Page 10: Scaling data on public clouds

What We Need

• Availability

• Consistency

• Scalability

Page 11: Scaling data on public clouds

Brewer's (CAP) Theorem

• It is impossible for a distributed computer system to simultaneously provide all three of the following guarantees:

– Consistency (all nodes see the same data at the same time)

– Availability (node failures do not prevent survivors from continuing to operate)

– Partition Tolerance (the system continues to operate despite arbitrary message loss)

http://en.wikipedia.org/wiki/CAP_theorem

Page 12: Scaling data on public clouds

What It Means

http://guyharrison.squarespace.com/blog/2010/6/13/consistency-models-in-non-relational-databases.html

Page 14: Scaling data on public clouds

ACHIEVING DATA SCALABILITY WITH RELATIONAL DATABASES

Page 15: Scaling data on public clouds

Databases And CAP

• ACID – Consistency

• Availability – tons of solutions, most of them not cloud oriented

– Oracle RAC

– MySQL Proxy

– Etc.

– Replication based solutions can solve at least read availability and scalability (see Azure SQL)

Page 16: Scaling data on public clouds

Database Cloud Solutions

• Amazon RDS

• NaviSite Oracle RAC

• Joyent + Zeus

Page 17: Scaling data on public clouds

So Where Is The Problem?

• Scaling problems (usually write but also read)

• Schema change problems

• BigData problems

Page 18: Scaling data on public clouds

Scaling Up

• Issues with scaling up when the dataset is just too big

• RDBMS were not designed to be distributed

• Began to look at multi-node database solutions

• Known as ‘scaling out’ or ‘horizontal scaling’

• Different approaches include:

– Master-slave

– Sharding

Page 19: Scaling data on public clouds

Scaling RDBMS – Master/Slave

• Master-Slave

– All writes are written to the master. All reads performed against the replicated slave databases

– Critical reads may be incorrect as writes may not have been propagated down

– Large data sets can pose problems as master needs to duplicate data to slaves

Page 20: Scaling data on public clouds

Scaling RDBMS - Sharding

• Partition or sharding

– Scales well for both reads and writes

– Not transparent, application needs to be partition-aware

– Can no longer have relationships/joins across partitions

– Loss of referential integrity across shards

Page 21: Scaling data on public clouds

Other ways to scale RDBMS

• Multi-Master replication

• INSERT only, not UPDATES/DELETES

• No JOINs, thereby reducing query time

– This involves de-normalizing data

• In-memory databases

Page 22: Scaling data on public clouds

ACHIEVING DATA SLA WITH NOSQL

Page 23: Scaling data on public clouds

NoSQL

• A term used to designate databases which differ from classic relational databases in some way. These data stores may not require fixed table schemas, and usually avoid join operations and typically scale horizontally. Academics and papers typically refer to these databases as structured storage, a term which would include classic relational databases as a subset.

http://en.wikipedia.org/wiki/NoSQL

Page 24: Scaling data on public clouds

NoSQL Types

• Key/Value – A big hash table – Examples: Voldemort, Amazon Dynamo

• Big Table – Big table, column families – Examples: Hbase, Cassandra

• Document based – Collections of collections – Examples: CouchDB, MongoDB

• Each solves a different problem

Page 25: Scaling data on public clouds

NO-SQL

http://browsertoolkit.com/fault-tolerance.png

Page 26: Scaling data on public clouds

Pros/Cons

• Pros: – Performance – BigData – Most solutions are open source – Data is replicated to nodes and is therefore fault-tolerant (partitioning) – Don't require a schema – Can scale up and down

• Cons: – Code change – Limited framework support – Not ACID – Eco system (BI, Backup) – There is always a database at the backend – Some API is just too simple

Page 27: Scaling data on public clouds

Amazon S3 Code Sample

AWSAuthConnection conn = new AWSAuthConnection(awsAccessKeyId, awsSecretAccessKey, secure, server, format); Response response = conn.createBucket(bucketName, location, null); final String text = "this is a test"; response = conn.put(bucketName, key, new S3Object(text.getBytes(), null), null);

Page 28: Scaling data on public clouds

Cassandra Code Sample CassandraClient cl = pool.getClient() ;

KeySpace ks = cl.getKeySpace("Keyspace1") ;

// insert value

ColumnPath cp = new ColumnPath("Standard1" , null,

"testInsertAndGetAndRemove".getBytes("utf-8"));

for(int i = 0 ; i < 100 ; i++){

ks.insert("testInsertAndGetAndRemove_"+i, cp ,

("testInsertAndGetAndRemove_value_"+i).getBytes("utf-8"));

}

//get value

for(int i = 0 ; i < 100 ; i++){

Column col = ks.getColumn("testInsertAndGetAndRemove_"+i, cp);

String value = new String(col.getValue(),"utf-8") ;

}

//remove value

for(int i = 0 ; i < 100 ; i++){

ks.remove("testInsertAndGetAndRemove_"+i, cp);

}

Page 29: Scaling data on public clouds

Cassandra Code Sample – Cont’

try{

ks.remove("testInsertAndGetAndRemove_not_exist", cp);

}catch(Exception e){

fail("remove not exist row should not throw exceptions");

}

//get already removed value

for(int i = 0 ; i < 100 ; i++){

try{

Column col = ks.getColumn("testInsertAndGetAndRemove_"+i, cp);

fail("the value should already being deleted");

}catch(NotFoundException e){

}catch(Exception e){

fail("throw out other exception, should be

NotFoundException." + e.toString() );

}

}

pool.releaseClient(cl) ;

pool.close() ;

Page 30: Scaling data on public clouds

Cassandra Statistics

• Facebook Search

• MySQL > 50 GB Data

– Writes Average : ~300 ms

– Reads Average : ~350 ms

• Rewritten with Cassandra > 50 GB Data

– Writes Average : 0.12 ms

– Reads Average : 15 ms

Page 31: Scaling data on public clouds

MongoDB Mongo m = new Mongo();

DB db = m.getDB( "mydb" );

Set<String> colls = db.getCollectionNames();

for (String s : colls) {

System.out.println(s);

}

Page 32: Scaling data on public clouds

MongoDB – Cont’

BasicDBObject doc = new BasicDBObject();

doc.put("name", "MongoDB");

doc.put("type", "database");

doc.put("count", 1);

BasicDBObject info = new BasicDBObject();

info.put("x", 203);

info.put("y", 102);

doc.put("info", info);

coll.insert(doc);

Page 33: Scaling data on public clouds

THE BOTTOM LINE

Page 34: Scaling data on public clouds

Data SLA

• There is no golden hammer – See http://sourcemaking.com/antipatterns/golden-

hammer

• Choose your tool wisely, based on what you need • Usually

– Start with RDBMS (shortest TTM, which is what we really care about)

– When scale issues occur – start moving to NoSQL based on your needs

• You can get Data Scalability in the cloud – just think before you code!!!

Page 35: Scaling data on public clouds

A BIT MORE ON SCALEBASE

Page 36: Scaling data on public clouds

How ScaleBase Works

• ScaleBase takes an application database and splits its data across multiple, separated instances (a technique called Sharding)

• Queries and DML are either: – Directed to correct instance, or – Executed simultaneously across

several instances

• Results are aggregated and returned to the original application

ScaleBase

Database instances

Application

Page 37: Scaling data on public clouds

Example

ID First name Last name

100 Steven King

101 Neena Kochhar

102 Lex De Haan

103 Alexander Hunold

104 Bruce Ernst

105 David Austin

106 Valli Pataballa

ID First name Last name

102 Lex De Haan

105 David Austin

ID First name Last name

100 Steven King

103 Alexander Hunold

106 Valli Pataballa

ID First name Last name

101 Neena Kochhar

104 Bruce Ernst

Page 38: Scaling data on public clouds

ScaleBase Supports

• 3 table types – Master – Global – Split

• Splitting according to Hash, List, Range • Rebalance, addition and removal of machines • Instance replication backup: Shadow and Master • Full consistent 2-Phase Commit • Joins, Foreign Keys, Subqueries • DML, DDL, Batch updates, Prepared Statements • Aggregations, Group By, Order By, Auto Numbering,

Timestamps

Page 39: Scaling data on public clouds

Sample Code

• site_owner_id is the split key

• Perform the query on all DBs

• Simple aggregation of results

• No Code Change

SELECT site_owner_id, count(*)FROM google.user_clicks WHERE country = ‘BRAZIL’ GROUP BY site_owner_id

Page 40: Scaling data on public clouds

Sample Code

• Perform the query on all DBs

• Aggregation of the aggregations

• No Code Change

SELECT country, count(*)FROM google.user_clicks GROUP BY country

Page 41: Scaling data on public clouds

Sample Code

• Is split key dynamic or static?

• Each command is added to the correct DB, execution is on all relevant DBs

• No Code Change

PreparedStatement pstmt = conn.prepareStatement("INSERT INTO emp VALUES(?,?,?,?,?)"); for (int i = 0; i < 10; i++) { pstmt.setInt(1, 300 + i); pstmt.setString(2, "Something" + i); pstmt.setDate(3, new Date(System.currentTimeMillis())); pstmt.setInt(4, i); pstmt.setLong(5, i); pstmt.addBatch(); } int[] result = pstmt.executeBatch();

Page 42: Scaling data on public clouds

• Elastic SQL Database Scaling hi-availability solution

– Complete

– Transparent

– Super scalable

– Out of the box

– Non-intrusive

– Flexible

– Manageable

ScaleBase Solution

Page 43: Scaling data on public clouds

• Pay much less for hardware and Database licenses • Get more power, better data spreading and better

availability • Real linear scalability • Go for real grid, cloud and virtualization

• ScaleBase is NOT:

– Is NOT an RDBMS. It facilitates any secure, high-available, archivable RDBMS (Oracle, DB2, MySQL, any!)

– Does NOT require schema structure modifications – Does NOT require additional sophisticated hardware

With ScaleBase

Page 44: Scaling data on public clouds

Moving To ScaleBase

• Implementing ScaleBase is done in minutes • Just direct your application to your ScaleBase

instance • Target ScaleBase to your original database and

target database instances • ScaleBase will automatically migrate your schema

and data to the new servers • After all data is transferred ScaleBase will start

working with target database instances, giving you linear scalability – with no down time!

Page 45: Scaling data on public clouds

Where ScaleBase Fits In

• Cloud databases

– Use SQL databases in the cloud, and get Scale Out features and high availability

• High scale applications

– Use your existing code, without migrating to NOSQL, and get unlimited SQL scalability

Page 46: Scaling data on public clouds

CUSTOMER CASE STUDIES

Page 47: Scaling data on public clouds

Public Cloud

• Scenario: – A startup developing a complex iPhone application

running on EC2 – Requires high scaling option for SQL based database

• Problem: – Amazon RDS offers only Scale Up option, no Scale Out

• Solution: – Customer switched their Amazon RDS-based

application to ScaleBase on EC2 – Gained transparent, linear, scaling – Running 4 RDS instances behind ScaleBase

Page 48: Scaling data on public clouds

Private Cloud

• Scenario: – A company selling devices that ‘ping’ home every 5 minute

– 8 digit number of devices sold

• Problem: – Evaluated MySQL, Oracle - no single machine can support

required devices

– Clustering options too expensive, limited scalability

• Solution: – Customer moved to ScaleBase with no code changes

– Gained linear scales. Runs 4 MySQL databases behind ScaleBase