next generation databases july2010

Post on 04-Dec-2014

1.728 Views

Category:

Technology

4 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

1

© 2010 Quest Software, Inc. ALL RIGHTS RESERVED

This is Not Your Father’s Database: Everything You Need to Know Now About Cloud Computing and Emerging Database Technology 

Guy Harrison

Director Research and Development, Melbourne

guy.harrison@quest.com

www.guyharrison.net

2

Introductions

3

4

Mainframes Minicomputers Client Server Internet/Y2K Boom After the gold rush

6

Current Day Trends• Big Data• Cloud computing• Solid State Disk

7

Big Data• The Industrial Revolution of data*

– User generated data:• Twitter, Facebook, Amazon

– Machine generated data:• RFID, POS, cell phones, GPS

• Traditional RDBMS neither economic or capable

* http://radar.oreilly.com/2008/11/the-commoditization-of-massive.html

8

Big data 1: Google

9

Map Reduce

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

10

Hadoop: Open source Map-reduce

• Yahoo! Hadoop cluster:– 4000 nodes– 16PB disk– 64 TB of RAM– 32,000 Cores– Very Low $/TB

11

Hive

SQL

Java

Re

sults

12

Big Data 2: Web 2.0

13

Twitter Growth

14

The fail whale

15

Web Servers

Database

Servers

Memcached Servers

Shard (G-O) Shard (P-Z)Shard (A-F)

Read Only Slaves

16

Clouds and Elastic provisioning

Over provisioned

Under provisioned

Capacity /

Demand

Time

Demand

Hardware upgrade

Capacity

17

CAP Theorem

Consistency

Availability

R

D

B

M

S

NO

GO

Partition

Tolerance

NoSQL

18

In search of the elastic database• Big Web sites AND Cloud applications need servers that scale

up (and down) on demand• Elastic provisioning works fine for web servers, application

servers, etc.• However RDBMS does not scale easily:

– SQL Azure limited to one database <50GB on a single host– Oracle’s RAC not supported in cloud environments– MySQL sharding “obnoxious”

• Many are willing to sacrifice relational database features for scalability and operational simplicity

19

The NoSQL movement

20

NoSQL (A.K.A.) Cloud databases• Generally DO NOT support

– SQL– Transactions– Immediate consistency

• Usually DO support:– Elasticity (scale out AND in)– Eventual consistency– Inherent redundancy and fault tolerance

21

NoSQL Data Models

Key Value Stores

Amazon Dynamo

Google BigTable

Document DB

JSON/XML DB

Graph Databases

MemcacheDB

Azure Table Services

Redis

Tokyo Cabinet

SimpleDB

Riak

Voldemort

Cassandra

Hbase

Hypertable

CouchDB

MongoDB

Neo4J

FlockDB

23

Not so easy to get the data out....

Amazon AWS Cloud

Microsoft Azure Cloud

On-Premise

(AKA private Cloud)

Data Hub

MySQL

HBase

SimpleDB

SQL Azure

Table Services SQL Server Oracle

Data Hub

SQL

SQL

26

Big Data 3: Data Warehousing

1996 1998 2000 2002 2004 2006 2008 20100

100

200

300

400

500

600

TB

27

Data Warehouse players

28

DATAllegro architecture

29

Column Databases (Vertica, Sybase)

• Data is stored together in columns

• Very fast answers to analytic aggregate queries

• Better compression• Not write optimized

30

Disk drives and Moore’s law• Transistor density doubles every 18 months• Exponential growth is observed in most electronic components:

– CPU clock speeds– RAM– Hard Disk Drive storage density

• But not in mechanical components– Service time (Seek latency) – limited by actuator arm speed and

disk circumference – Throughput (rotational latency) – limited by speed of rotation,

circumference and data density

31

Big Data vs. Fast Data

IO Rate Disk Capacity IO/GB CPU IO/CPU-1,000

-500

0

500

1,000

1,500

2,000

260 1,635

-630

1,013

-390

%ag

e ch

ang

e

Disk trends 2001-2009

32

SSD to the rescue?

Solid State Disk DDR-RAM

Solid State Disk Flash

Magnetic Disk

0 1,000 2,000 3,000 4,000 5,000

15

200

4,000

Seek time (us)

33

Power consumption

Idle

Seek

Start up

1 10 100

8

10

20

Flash SSD

SATA HDD

Watts (logarithmic scale)

34

Economics of SSD

Capacity HDDs

Performance HDDs

Flash SSDs (read)

DRAM SSDs

$0.10 $1.00 $10.00 $100.00 $1,000.00

$13.30

$16.60

$1.40

$0.50

$3.00

$28.00

$100.00

$400.00

$/GB

$/IOPs

35

Fast reads but slow writes

256 page block erase

4k page write

4k page seek

0 500 1000 1500 2000 2500

2000

250

25

microseconds

36

Hierarchical storage management

Main Memory

DDR SSD

Flash SSD

Disk

Tape

$/IOP$/

GB

37

In Memory Databases: VoltDB & H-Store• In Memory Distributed (“Sharded”) Database• No transactional IO• ACID transactions (k-safety)• Single Threaded (no latches or locks)• Java Stored Procedure transactions• Hierarchical data model

• Double Shared Nothing (disk

OR CPU)

• Spool out to DW for ad-hoc

analysis

• Very high TPS for suitable

applications

38

Oracle EXADATA

• RAC clusters provide MPP• Dedicated storage servers• High Speed infiniband

channels • Smart storage reduces data

transfer requirements • Hybrid Flash & spinning disk

storage system• Flash caching in the database

systems

39

The Next Generation?

40

© 2010 Quest Software, Inc. ALL RIGHTS RESERVED

너를 감사하십시요 Thank You Danke Schön

Gracias 有難う御座いました Merci

Grazie Obrigado 谢谢

top related