no sql (not only sql)

20

Upload: priyodarshini-dhar

Post on 15-Jul-2015

208 views

Category:

Engineering


7 download

TRANSCRIPT

History

What is NoSQL?

CAP Theorem

Eventual Consistency

Data Models

Cassendra

1980

1990

2000

2010

Rise of Relational DatabasePros: Persistent, Concurrency Cons: Impedance Mismatch Problem

Rise of Object Database

Dominance of Relational DatabaseCons: Data needs increased, distributed database started, SQL not designed for DDBMS

Google BigTable

Amazon Dynamo

NoSQL is a term for a loosely defined class of non-relational

data stores that breaks the long history of relational databases

and ACID guarantees.

Data stores that fall under this term may not require fixed

table schemas, and usually avoid join operations.

The term was first popularised in early 2009.

Three properties of a system: consistency, availability and partitions

We can have at most two of these three properties for any shared-data system.

Consistency-all clients see

current data regardless of updates or

deletes

Availability-the system

continues to operate as expected

even with node failures

Partition Tolerance-the system continues to operate as

expected despite network or

message failure

CA

CP AP

A consistency model determines rules for visibility and apparent order of updates.

For example:

Row X is replicated on nodes M and N

Client A writes row X to node N

Some period of time t elapses.

Client B reads row X from node M

Does client B see the write from client A?

Consistency is a continuum with tradeoffs

For NoSQL, the answer would be: maybe

CAP Theorem states: Strict Consistency can't be achieved at the same time as availability and

partition-tolerance.

X X

M N

A WRITES B READS

X*

X or X*?

When no updates occur for a long period of time, eventually all

updates will propagate through the system and all the nodes will be

consistent

Known as BASE (Basically Available, Soft state, Eventual consistency),

as opposed to ACID

* Basically Available - system seems to work all the time

* Soft State - it doesn't have to be consistent all the time

* Eventually Consistent - becomes consistent at some later time

123

564

789

Databases

Pros:very fastvery scalablesimple modelable to distribute horizontally

Cons: - many data structures (objects) can't be easily modeled as key value pairs

Document Data Model:

-Each document is a complex structure-Represented in XML,Jason-Query into the document structure to retrieve portions of the database

metadata

key

Stores different column family

Cheap, easy to implement (open source)

Data are replicated to multiple nodes (therefore identical and fault-tolerant)

and can be partitioned

◦ Down nodes easily replaced

◦ No single point of failure

Easy to distribute

Don't require a schema

Can scale up and down

Relax the data consistency requirement (CAP)

What we are giving up…

• joins• group by• order by• ACID transactions• SQL as a sometimes frustrating but still powerful query

language• Easy integration with other applications that support

SQL

Originally developed at Facebook

It is a distributed, extreme scalable,

fault tolerant post-relational database solution

Data Model : column-oriented

Uses the Dynamo Eventual Consistency model

Written in Java

Open-sourced and exists within the Apache family

Uses Apache Thrift as it’s API

Cassendra was designed with the understanding that

system/hardware failures can and do occur.

Peer-to-peer ,distributed system

All nodes are the same

Read/Write-anywhere design

Data center 1

Data center 2

The coordinator sends the write

to all replicas that own the row

being written.

As long as all replica nodes are

up and available, they will get

the write regardless of

the consistency level (Tunable)

specified by the client. (LOCAL_QUORUM)

Multiple Data Center Write Requests

There are two types of read requests :

1) direct read request

2) background read repair request.

The number of replicas contacted by a direct read request is determined by

the consistency level specified by the client.

Background read repair requests are sent to any additional replicas that did

not receive a direct request.

Read repair requests ensure that the requested row is made consistent on

all replicas.

The coordinator first contacts the replicas specified by the consistency

level.

If multiple nodes are contacted, the rows from each replica are compared

for consistency in memory.

If replicas are inconsistent, the following events occur:

◦ The coordinator uses the replica that has the most

recent data (based on the timestamp) to forward

the result back to the client.

◦ In the background, the coordinator

compare the data from all the

remaining replicas that own

the row.

Created at Facebook along with Cassandra

Is a cross-language, service-generation framework

Binary Protocol (like Google Protocol Buffers)

Compiles to: C++, Java, PHP, Ruby, Erlang, Perl, ...

Relational (SQL)

◦ SELECT `column` FROM `database`,`table` WHERE `id` = key;

Cassandra (standard) (CQL)

◦ keyspace.getSlice(key, “column_family”, "column")

Thank You