newsql - deliverance from base and back to sql and acid

31
NewSQL - Deliverance from BASE and back to SQL and ACID There are a number of NewSQL products now on market such as VoltDB and Progres-XL. These promise NoSQL performance and scalability but with ACID and relational concepts implemented with ANSI SQL. This session will cover off why NoSQL came about, why it's had it's day and why NewSQL will become the backbone of the Enterprise for OLTP and Analytics. Tony Rogerson, SQL Server MVP [email protected] @tonyrogerson http://dataidol.com/tonyrogerson

Upload: tony-rogerson

Post on 08-Jul-2015

757 views

Category:

Data & Analytics


1 download

DESCRIPTION

There are a number of NewSQL products now on market such as VoltDB and Progres-XL. These promise NoSQL performance and scalability but with ACID and relational concepts implemented with ANSI SQL. This session will cover off why NoSQL came about, why it's had it's day and why NewSQL will become the backbone of the Enterprise for OLTP and Analytics.

TRANSCRIPT

Page 1: NewSQL - Deliverance from BASE and back to SQL and ACID

NewSQL - Deliverance from BASE and back to SQL and ACID

There are a number of NewSQL products now on market such as VoltDB and Progres-XL. These promise NoSQL performance and scalability but with ACID and relational concepts implemented with ANSI SQL.

This session will cover off why NoSQL came about, why it's had it's day and why NewSQL will become the backbone of the Enterprise for OLTP and Analytics.

Tony Rogerson, SQL Server MVP

[email protected]@tonyrogersonhttp://dataidol.com/tonyrogerson

Page 2: NewSQL - Deliverance from BASE and back to SQL and ACID

Who am I?Freelance SQL Server professional and Data Specialist

Fellow BCS, MSc in BI, PGCert in Data Science

28 years of development and database experience, 22 of which SQL Server – starting out in 1986 with VSAM, System W, Application System, DB2 and Oracle crossing over to Client/Server and SQL Server since 4.21a in 1993

Awarded SQL Server MVP yearly since 97

Founded UK SQL Server User Group back in ’99, founder member of DDD, SQL Bits, SQL Relay, SQL Santa

Interested in commodity based distributed processing of Data (naturally!)

Page 3: NewSQL - Deliverance from BASE and back to SQL and ACID

AgendaNoSQL

◦ Why the need?◦ What products are available?

Transactions◦ BASE◦ ACID

SQL◦ What is today’s SQL capable of?◦ SQL Server performance – NoSQL required?

NewSQL◦ SQL -> NoSQL -> NewSQL (distributed form of where we started)◦ Distributed Data and ACID

Discussion

Page 4: NewSQL - Deliverance from BASE and back to SQL and ACID

Not Only SQL (NoSQL)WHY THE NEED?

Page 5: NewSQL - Deliverance from BASE and back to SQL and ACID

Why the Need?The year is 2001 and

◦ It’s that Big Data thing….

◦ Mainstream Relational Databases (that use SQL) are scale up

◦ More grunt required – buy a bigger box

◦ SAN based storage is ridiculously expensive and complicated, heavy TCO

Y2K + 1◦ Developers twiddling their thumbs ;)

Web adoption accelerates◦ Google, Yahoo, Amazon and the like are born

◦ MySQL does not scale – too inflexible

◦ Up front costs of kit for projects/business that may fail – need elasticity

http://www.tomshardware.co.uk/15-years-of-hard-drive-history-uk,review-1908-7.html

Page 6: NewSQL - Deliverance from BASE and back to SQL and ACID

Products AvailableVaried – type of NoSQL database

◦ Graph

◦ Key-Value

◦ Column store/Column Family

◦ Document Store

◦ Object

◦ Relational but without SQL

You name it and there is a product to do it

Page 7: NewSQL - Deliverance from BASE and back to SQL and ACID

Performance Today [commodity]64KiB 100% Read

100% sequential 100% random

Page 8: NewSQL - Deliverance from BASE and back to SQL and ACID

ACIDAtomicity

◦ The bounds of the transaction – everything within those bounds is a single unit of work◦ All or nothing

Consistency◦ Data must reside in the correct Domain of values◦ Deferrable to the end of the unit of work

Isolation◦ Changes are Isolated from other users◦ Other connections cannot update what you have updated/updating◦ Multi-Value Concurrency Control (MVCC) – snapshots◦ Locking

Durability◦ In system failure your changes are still maintained – nothing is lost

Page 9: NewSQL - Deliverance from BASE and back to SQL and ACID

BASE (Basically Available, Soft-state, Eventually Consistent)BASE is a Transactional modelish (at the global level, rather than individual transactions)

Specific to Distributed database model

Basically Available – all or some of the system is available

Node 1 Node 2 Node 3

Page 10: NewSQL - Deliverance from BASE and back to SQL and ACID

BASE (Basically Available, Soft-state, Eventually Consistent)

Soft-stateEventually Consistent

System may change over time [as replica’s become up-to-date (consistent)]

Node 1 Node 2 Node 3

Insert value ‘A’

Page 11: NewSQL - Deliverance from BASE and back to SQL and ACID

Eventual Consistency in SQL ServerAsynchronous Availability Groups/Database Mirroring

Replication

Eventual / Causal Consistency◦ Eventual no good for order specific [and important] transactions

◦ Like Merge replication

◦ Causal: deliver messages in correct order [e.g. service broker]◦ Like Transactional Replication

Page 12: NewSQL - Deliverance from BASE and back to SQL and ACID

ACID - Distributed2PC is clunky and doesn’t scale across many nodes

PAXOS – Consensus theory – scales better

Remove the need for distributed ACID altogether

Coordinator

Subordinate

SubordinateINSERT

2PC Transaction

All or nothing

Subordinate

Page 13: NewSQL - Deliverance from BASE and back to SQL and ACID

Mixing BASE and ACID ACID applied local data node

BASE remote

Page 14: NewSQL - Deliverance from BASE and back to SQL and ACID

RelationalSets

Tables with Rows x Columns

Relational Theory dictates the row/column intersection is an Atomic value i.e. contains only a single value from the domain modelled for that column

Chris Date:◦ Atomicity cannot really be defined as absolute in Normal Form

◦ a column can contain “relational values” i.e. another table

Normal Form – the process used to define the schema around the data being modelled

Page 15: NewSQL - Deliverance from BASE and back to SQL and ACID

OldSQL rootsBuilt for disk storage

Built for single machine, scale-up

Mature SQL language (decades of research) over the Relational Model

SQL extensions to deal with unstructured data (freetext)

Page 16: NewSQL - Deliverance from BASE and back to SQL and ACID

OldSQL todayACI [no Durability]

In-Memory

Modified design to work with Flash

Still scale-up

Page 17: NewSQL - Deliverance from BASE and back to SQL and ACID

SQL ServerDelayed / No-Durability in SQL Server 2014

In-Memory extensions

Entity Attribute Value design combined with ColumnStore

Sparse Columns / Column sets

DEMOS

Page 18: NewSQL - Deliverance from BASE and back to SQL and ACID

NewSQLOLDSQL -> SQL -> NEWSQL

Page 19: NewSQL - Deliverance from BASE and back to SQL and ACID

Describe NewSQLNewSQL = OldSQL + Transparent_Data_Distribution + ACID

Also – add in the knobs and whistles for new tech◦ Flash

◦ RAM

◦ Processor cache improvements

◦ Better parallelisation across local processor cores

Basically -> Scale out with ACID

Page 20: NewSQL - Deliverance from BASE and back to SQL and ACID

Latency in a Distributed environment

Server

1Gbit ethernet

Server

Switch

Server

Server

Server

Server

SQL ServerFirstName Surname DOB

Query returns20,000 rows558KiBytes of data

FastestSlowerSlowest(Data Travel)

Page 21: NewSQL - Deliverance from BASE and back to SQL and ACID

Reduce Latency – Data Locality

SQL ServerServer1Gbit ethernetServer

Switch

Server

Server

Server

Server

SQL ServerServer

SQL ServerServer

Page 22: NewSQL - Deliverance from BASE and back to SQL and ACID

Distributed SQL with ACID

SQL ServerServer11Gbit ethernet

Switch

SQL ServerServer2BEGIN DISTRIBUTED TRAN

INSERT Server3.pres_NEWSQL.dbo.people( ….. )INSERT Server2.pres_NEWSQL.dbo.people( ….. )INSERT Server1.pres_NEWSQL.dbo.people( ….. )

COMMIT TRAN

• 2 Phase Commit using DTC• High Latency• All or nothing

SQL ServerServer2

Page 23: NewSQL - Deliverance from BASE and back to SQL and ACID

Querying a Distributed EnvironmentFinancial Trading – Global position of the book

TOP 10 customers

Not easy (at speed) in an OLTP setting

N1 N2 N3 N4

Network Switch

Page 24: NewSQL - Deliverance from BASE and back to SQL and ACID

Couple {Data, Processing} with {Machine-n}

Page 25: NewSQL - Deliverance from BASE and back to SQL and ACID

PartitioningChop big table up into “horizontal partitions”

Partition key required (Mash, Modulo, Key range)

Each partition is self-contained binding rows by the partitioning key

Access all data through logical view over all partitions (local database)

Table by table basis

Page 26: NewSQL - Deliverance from BASE and back to SQL and ACID

Shared NothingPartitioning+

Each Shard is self-contained and has all the procs, meta-data and of course your partition of data

Shard Key common to multiple tables, for example CustomerID, Email Address.

Greater autonomy across the distributed database

Seeing the entire database as a logical unit is more difficult – joining is a nightmare

Node 1

Node 2

Node 3

Page 27: NewSQL - Deliverance from BASE and back to SQL and ACID

Data Distribution using HashingDistributed Database Cluster has fixed number of data nodes

Your data is spread across the database cluster◦ 10 node cluster; each data item may reside on 3 nodes

◦ Which 3 nodes?

Data key is Hashed to a number – hashing algorithm is deterministic

data-node = f( data-key )◦ print ( checksum( 'All hale to the ale' ) * 1.) % 10

◦ print ( checksum( 'And a glass of wine for the ladies' ) * 1.) % 10

Page 28: NewSQL - Deliverance from BASE and back to SQL and ACID

Sharding Sync

LOGICAL DATABASE

Pick a node

Node 1

Node 2

Node 3

Full copy of data

Subset of data

Replication

Apps

Page 29: NewSQL - Deliverance from BASE and back to SQL and ACID

Postgres-XC

Coordinators(plans, 2pc trans, knows about data distribution)

Applications(issue SQL to coordinators)

Data Nodes

GTMGlobalTransactionManager

http://de.slideshare.net/PavanDeolasee/postgresxc-28475161

Page 30: NewSQL - Deliverance from BASE and back to SQL and ACID

Combine Sharding + ReplicationShard your big tables based on a hash (or something) around your business key e.g. Customer, EmailAddress etc.

Replicate static tables.

Page 31: NewSQL - Deliverance from BASE and back to SQL and ACID

Discussion

[email protected]

@tonyrogerson

http://dataidol.com/tonyrogerson