no sql databases, big data and the cloud

Post on 07-Jul-2015

219 Views

Category:

Data & Analytics

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

Summary on NOSQL databases.

TRANSCRIPT

Manu Cohen-Yashar

The Cloud, Big Data and

NoSQL

Agenda

Data boom

Problems with RDBMS

No SQL

Big Data

What’s next

Understand NO SQL

Types of databases

Primary usage

Data model

Pros and Cons

Lots of Data

Data is doubles every 18 month

Pictures

Web site

emails

Sensors

Geo Information

Financial Information

Science

Art

. . . (Infinite list)

No Limits

With the cloud it is now possible to mount any

size if cluster and conduct any computation in

any scale.

The one who will make sense of all available

data will rule the world.

The conclusion:

Use the cloud to analyze large scale of data.

Lets Talk about data

When we think of data we think of …

Data has many forms

Yet data comes in many forms and shapes

Graphs Documents

Time Series

Blobs

Geo

Sensors

Unstructured

Structured

Web

Problems with RDBMS

Does not scale very well

Sharding

Replication

Models data according to the relational model

Is this the best model for all data types?

Complex and Expensive

Require a DBA

Expensive to buy

Oracle

SQL

No Relational

Not all types of data fit well into the relational

world.

Not all data use cases fit well into the ACID

convention

The relational model does not scale very good

Difficult to distribute

Difficult to replicate

The CAP Theory

RDBMS

Replicated NoSQL

ShardedNoSQL

During a network partition, a distributed system must choose either Consistency or Availability.

NO SQL

Large family of databases

No Schema

No relations enforced

Designed for high scale and distribution

Types of NO SQL DB

Key Value

Wide Columns

Documents

Graph

Motivation for NO SQL

Large Scale and Distribution

Simplicity

Low cost

Good fit with the data model

Volume, Velocity and Variety

What Is No Schema

Some data is structured, and some does not.

No SQL databases do not ENFORCE a

schema like RDBMS systems.

You can leverage data structure by creating

indexes and smart queries.

Types of NO SQL Databases

Key values

Wide column

Document

Graph

Key values

Data is ordered as a key - values pair

Query by key and values

Simple indexes (by partition key)

ExamplesAzure Table Storage

Amazon DynamoDB

Key1 Key2 VaIue1 VaIue2 VaIue3 VaIue4 VaIue5

Israel 1234 1 2 3

France 2345 4 5 8

Demo

DynamoDB and Azure Tables

Wide column / Column Families

Data is ordered as a key – value groups

Store data by columnA column family is how the data is stored on the disk

Query by key\key range only

No Indexes (on some dbs)

ExamplesGoogle Big-Table

Cassandra

HBase

Example – Cassandra Data Model

Column

Key value

Super Column

Collection of columns

Column Family

Dictionary of columns

Super Column Family

Dictionary of Column Families

Demo

Cassandra

Document Database

Data is ordered as a Key – Document

Query by key and document content

Use indexes

Examples

Mongo

Raven

CouchDB \ Couchbase

Demo

Graph databases

Data is ordered in elements and relations.

Query by relations

Supports complicated mathematical graph

calculus

Examples

Neo 4J

StarDog (used for sematic web)

RDF and OWL

TripleSubject - Predicate – Object

Define facts

RDF (Resource Description Framework)Defines some extra structure to triples.

Example: "rdf:type“ is used to say that things are of certain types.

Schema: Defines some classes which represent the concept of subjects, objects, predicates etc.

Enables making statements about classes of thing, and types of relationship.

OWLAdds semantics to the schema.

Expressed in triples.

Example: "If A isMarriedTo B" then this implies "B isMarriedTo A".

Demo

There is no one NO SQL solution for all

use cases

Important

There are over than 150 possible offerings…

Replication and Sharding

No SQL databases can span over a large cluster

ReplicationCopy the data to multiple servers

Usually each data element is copied 3 times

One master two slaves

Result: High Availability

ShardingSplit the data between servers

Horizontal partitioning of the data

Result: Horizontal scale

Replication and Sharding can be done together

The Cloud and NO SQL

All Cloud Providers have NO SQL solutions

Azure Tables

Google Big Table

Amazon DynamoDB

NO SQL Databases are deployed on a cluster

There are large number of cloud hosting offerings for

no-sql clusters

MongoHQ (MongoDB)

Cassandra on Google Compute engine

Many more

Example – Mongo in Azure

Check your schema

Be open to use NO-SQL data stores

Identify your use-case and find the right

database for you

Create a simple POC

Questions

top related