datastax c*ollege credit: what and why nosql?

47
1 C*ollege Credit Aaron Morton Robin Schumacher What and Why NoSQL?

Upload: datastax

Post on 15-May-2015

1.686 views

Category:

Technology


1 download

DESCRIPTION

In the first of our bi-weekly C*ollege Credit series Aaron Morton, DataStax MVP for Apache Cassandra and Apache Cassandra committer and Robin Schumacher, VP of product management at DataStax, will take a look back at the history of NoSQL databases and provide a foundation of knowledge for people looking to get started with NoSQL, or just wanting to learn more about this growing trend. You will learn how to know that NoSQL is right for your application, and how to pick a NoSQL database. This webinar is C* 101 level.

TRANSCRIPT

Page 1: DataStax C*ollege Credit: What and Why NoSQL?

1

C*ollege Credit

Aaron MortonRobin Schumacher

What and Why NoSQL?

Page 2: DataStax C*ollege Credit: What and Why NoSQL?

2

• 40 minute webinar• 15 minute Q+A• #CassandraQA• WebEx Q&A window

• Slides and recording will be available

• Next webcast: • Time for a new relationship?(Information Week)• September 26th

Housekeeping

Page 3: DataStax C*ollege Credit: What and Why NoSQL?

3

The Presenters

Aaron Morton (@aaronmorton)DataStax MVP for Apache CassandraAaron Morton is a Freelance Developer based in New Zealand, and a Committer on the Apache Cassandra project. In 2010 he gave up the RDBMS world for the scale and reliability of Cassandra. He now spends his time advancing the Cassandra project and helping others get the best out of it.www.thelastpickle.com

Page 4: DataStax C*ollege Credit: What and Why NoSQL?

4

The Presenters

Robin SchumacherVP of Products @ DataStaxRobin Schumacher has spent the last 20 years working with databases and big data. Before DataStax he was at EnterpriseDB, where he built and led a market-driven product management group. Previously, Robin started and led the product management team at MySQL for three years before they were bought by Sun, and then by Oracle. He also started and led the product management team at Embarcadero Technologies. Robin is the author of three database performance books and frequent speaker at industry events. Robin holds BS, MA, and Ph.D. degrees from various universities.

Page 5: DataStax C*ollege Credit: What and Why NoSQL?

5

Today.

What is No SQL.

Page 6: DataStax C*ollege Credit: What and Why NoSQL?

6

Once upon a time in 1996...

Visual Basic v4.0 16bitNovel NetWare v4.10Btrieve v6.15

Page 7: DataStax C*ollege Credit: What and Why NoSQL?

7

No SQL 1996...

Procedural API

Indexed Sequential Access Method (ISAM)

Page 8: DataStax C*ollege Credit: What and Why NoSQL?

8

No SQL 1996...

Define indexes to support read pattern.

Client side joins.(Yes in VB4.)

Page 9: DataStax C*ollege Credit: What and Why NoSQL?

9

ANSI SQL for the people...

1986First ANSI standard.

1989 FOREIGN KEY

1992

New types, JOIN, DDL, Transaction

Isolation Levels1999 Triggers

Page 10: DataStax C*ollege Credit: What and Why NoSQL?

10

MySQL...

1996, v3.19First public

release

1999, v3.23MyISAM engine, no Transactions

2001, v4.XInnoDB, ACID Transactions, FOREIGN KEY

Page 11: DataStax C*ollege Credit: What and Why NoSQL?

11

Microsoft SQL Server...

1995, v6.0PRIMARY KEY, FOREIGN KEY

1996, v6.5 JOIN

1998, v7.0NVARCHAR, replication

2000, v2000Referential

Integrity actions

Page 12: DataStax C*ollege Credit: What and Why NoSQL?

12

PostgreSQL...

1989, v1.0Small limited

release

1997, v6.2 Triggers

1998, v6.3 Sub selects

1999, v6.5.3MVCC

Transactions

2000, v7.0.3FOREIGN KEY,

JOIN

Page 13: DataStax C*ollege Credit: What and Why NoSQL?

13

In search of Scale and Availability...

Caching• Adds application complexity• Adds operational complexity• Thundering Herds• “There are 2 hard problems in computer science: caching, naming, and off-by-1 errors”

Page 14: DataStax C*ollege Credit: What and Why NoSQL?

14

In search of Scale and Availability...

Sharding • Adds application complexity

• Adds operational complexity • Schema defined in multiple databases • SPOF for shard •Hard to grow and keep balanced

Page 15: DataStax C*ollege Credit: What and Why NoSQL?

15

In search of Scale and Availability...

Master-Slave replication• Fail over may add application complexity • Unknown asynchronous delay in

replication• Potentially wasting resources on Slave• Reliability of passive Slave is unknown• “We failed to fail over to the slave.”

Page 16: DataStax C*ollege Credit: What and Why NoSQL?

16

In search of Scale and Availability...

Write Master - Read Slaves • Adds application complexity • Unknown asynchronous delay in replication • SPOF for writes

Page 17: DataStax C*ollege Credit: What and Why NoSQL?

17

In search of Scale and Availability...

Schema• ALTER TABLE locks the table • Must be applied to many individual servers• “foo varchar(50) DEFAULT NULL”

Page 18: DataStax C*ollege Credit: What and Why NoSQL?

18

And then Google published...

Bigtable: A Distributed Storage System for Structured Data2006

Page 19: DataStax C*ollege Credit: What and Why NoSQL?

19

And Amazon published...

Dynamo: Amazon’s Highly Available Key-Value Store 2007

Page 20: DataStax C*ollege Credit: What and Why NoSQL?

20

And Facebook published...

Cassandra - A Decentralized Structured Storage System 2008

Page 21: DataStax C*ollege Credit: What and Why NoSQL?

21

And databases went punk (again)...

Page 22: DataStax C*ollege Credit: What and Why NoSQL?

22

Key-Value stores...

2007 Tokyo Cabinet

2009 Redis

2009 Voldemort

2009 Riak

Page 23: DataStax C*ollege Credit: What and Why NoSQL?

23

Document Orientated stores...

2008Apache Couch DB

2009 MongoDB

Page 24: DataStax C*ollege Credit: What and Why NoSQL?

24

Graph stores...

2007 Neo4J

2009 Infogrid

2010 InfiniteGraph

Page 25: DataStax C*ollege Credit: What and Why NoSQL?

25

Column Family stores...

2007Apache Hbase (as part of Lucene)

2008 / 2011BigTable as part of Google App Engine

2009 Apache Cassandra

2012Amazon DynamoDB

Page 26: DataStax C*ollege Credit: What and Why NoSQL?

26

Common “No SQL” features... • Cluster based• Replication built in • No schema or flexible schema• Expect node failure

Page 27: DataStax C*ollege Credit: What and Why NoSQL?

27

• Aaron Morton• @aaronmorton• www.thelastpickle.com

Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License

Page 28: DataStax C*ollege Credit: What and Why NoSQL?

28

Why NoSQL..?

Page 29: DataStax C*ollege Credit: What and Why NoSQL?

29

“NoSQL is the stuff of the Internet Age.”

- Andrew Oliver, InfoWorld

Page 30: DataStax C*ollege Credit: What and Why NoSQL?

30

What Characterizes the “Internet Age” with data?

1. Big Data – Concerns…• Scaling data velocity, variety, volume

2. Data in the Cloud – Promises…• Transparent elasticity• Scalability • Availability • Ease of use (data distribution,

redundancy, etc.) • All these also needed on premise…

3. Data “everywhere” – needing to support multiple data centers, geographies, etc.

Page 31: DataStax C*ollege Credit: What and Why NoSQL?

31

Why NoSQL?You have Big Data use cases.

• Volume, variety, volume• Complexity of data distribution• Future proof apps where scaling is concerned

“Big data technologies describe a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high-velocity capture, discovery, and/or analysis ” - IDC

Page 32: DataStax C*ollege Credit: What and Why NoSQL?

32

Why NoSQL?Cassandra – a massively scalable NoSQL database• Superior write performance for data velocity• Strong data type support for data variety• Linear scalability/scale out for data volume• Fast for both reads and writes

“We’ve seen a 700% performance improvement, while our database grew over 500% at the same time. Plus we’ve saved 40% in operational costs.” - SourceNinja

Page 33: DataStax C*ollege Credit: What and Why NoSQL?

33

Why NoSQL? Cassandra and Performance

YCSB BenchmarkSource: http://blog.cubrid.org/dev-platform/nosql-benchmarking/?utm_source=NoSQL+Weekly+List&utm_campaign=143fae86b2-NoSQL_Weekly_Issue_41_September_8_2011&utm_medium=email

“In terms of scalability, there is a clear winner throughout our experiments. Cassandra achieves the highest throughput for the maximum number of nodes in all experiments with a linear increasing throughput.”

Solving Big Data Challenges for Enterprise Application Performance Management, Tilman Rable, et al., August 2012, p. 10. Benchmark paper presented at the Very Large Database Conference, 2012. http://vldb.org/pvldb/vol5/p1724_tilmannrabl_vldb2012.pdf

http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html

In the Cloud… In Web Apps…

Page 34: DataStax C*ollege Credit: What and Why NoSQL?

34

Why NoSQL?You need continuous availability.

• Different than high availability • For applications that can’t go down• May involve one or multiple locations

Page 35: DataStax C*ollege Credit: What and Why NoSQL?

35

Why NoSQL?Cassandra – a continuously available NoSQL DBMS• Built to overcome the fact that hardware failures can

and do occur• No single point of failure• Out-of-the-box redundancy of function and data

“For us, the primary motivating factors are continuous availability and multi-data center support. We also like the fact that we can trust Cassandra; when we need to write data, we don’t have to worry that it’s going to get written and be there no matter what.”- RightScale

Page 36: DataStax C*ollege Credit: What and Why NoSQL?

36

Why NoSQL?You need true location independence.

• Need to read AND write data anywhere• Data is eventually synchronized in all

locations• Keep data local for fast access

Page 37: DataStax C*ollege Credit: What and Why NoSQL?

37

Why NoSQL?Cassandra – a location independent database

• Replication is multi-data center, multi-directional capable

• Handles multiple cloud geo-zones• Supports hybrid on-premise/cloud deployments• Tunable data consistency

“I can create a Cassandra cluster in any region of the world in 10 minutes. When marketing decide we want to move into a certain part of the world, we’re ready.”- Netflix

Page 38: DataStax C*ollege Credit: What and Why NoSQL?

38

Why NoSQL?You need real-time, transactional capabilities

• For applications needing ACID, use RDBMS• For applications without ACID requirements, but with

transactional needs, use NoSQL• The “C” is ACID does not apply to NoSQL; the “C” in

the CAP theorem does

“Ninety-five percent (95%) of database-driven systems today don’t need ACID transactions.” – Dan McCreary, The CIO’s Guide to NoSQL Webinar

Page 39: DataStax C*ollege Credit: What and Why NoSQL?

39

Why NoSQL?Cassandra – real-time NoSQL transactions

• Supports AID transactions: atomic, isolated, and durable

• Provides tunable data consistency – per operation – to handle the “C” in the CAP theorem

• No ACID “C” as there are no referential integrity/foreign key constraints

“Cassandra stands at the front of the NoSQL pack when it comes to supporting real-time, Big Data applications.” – Wikibon

Page 40: DataStax C*ollege Credit: What and Why NoSQL?

40

Why NoSQL?You need a more flexible/agile data model.• Escape the rigidity of the relational data

model• Able to easily store and access all data types • Few worries about performance of “wide”

rows

Page 41: DataStax C*ollege Credit: What and Why NoSQL?

41

Why NoSQL?The Cassandra Data Model - Bigtable

• A row-oriented, column structure• A column family is similar to an RDBMS table

but is more flexible/dynamic• A row in a column family is indexed by its

key. Other columns may be indexed as well

ID Name SSN DOB

Keyspace

Column Family

“Cassandra’s NoSQL data model allows us to insert and query data much more naturally than what we had previously. The analysts who routinely use this data were impressed with the flexibility and speed at which the queries came back.” - NASA

Page 42: DataStax C*ollege Credit: What and Why NoSQL?

42

Why NoSQL?You need a better architecture.

• Master/slave – inherent issues; write bottlenecks• Sharding – difficult to setup/maintain• Shared storage – has availability concerns

Page 43: DataStax C*ollege Credit: What and Why NoSQL?

43

Why NoSQL?Cassandra – a “masterless” architecture

• Peer-to-peer design • No write bottlenecks• No manual sharding or shared storage issues• Less operational overhead

“Cassandra was just a better design all around – more truly horizontally scalable and with less management overhead – and there’s no single point of failure. I looked at Cassandra’s architecture and thought, ‘Yeah, that’s how you do it.’” - Backupify

Page 44: DataStax C*ollege Credit: What and Why NoSQL?

44

Why NoSQL?Because you need…• The ability to handle big data use cases• Continuous availability vs. high availability• A location independent database• A real-time, transactional database• A more flexible/agile data model• A better architecture

Page 45: DataStax C*ollege Credit: What and Why NoSQL?

45

Key Cassandra Use Cases

• Real-time, big data workloads• Time series data management• High-velocity device data consumption and analysis• Media streaming management (e.g., music, movies) • Social media (i.e., unstructured data) input and

analysis• Online web retail (e.g., shopping carts, user

transactions) • Real-time data analytics • Online gaming (e.g., real-time messaging) • Software as a Service (SaaS) applications that utilize

web services • Online portals (e.g. healthcare provider/patient

interactions)• Most write-intensive systems

Page 46: DataStax C*ollege Credit: What and Why NoSQL?

46

Why NoSQL?

- The CIO’s Guide to NoSQL, Dan McCreary

Page 47: DataStax C*ollege Credit: What and Why NoSQL?

47

More Resources

• Cassandra.Apache.org• PlanetCassandra.org• Datastax.com

Thank You!