orientdb - the 2nd generation of (multimodel) nosql - luigi dell aquila - codemotion amsterdam 2016

53
- the 2nd generation of ( Multi-Model ) NoSQL - the 2nd generation of ( Multi-Model ) NoSQL And why GraphDB are the And why GraphDB are the starting point of this revolution starting point of this revolution

Upload: codemotion

Post on 12-Apr-2017

146 views

Category:

Technology


0 download

TRANSCRIPT

- the 2nd generation of (Multi-Model) NoSQL

- the 2nd generation of (Multi-Model) NoSQL

And why GraphDB are the And why GraphDB are the starting point of this revolutionstarting point of this revolution

#OrientDB

Andrea IaconoSoftware EngineerOrientDBhttp://www.orientdb.com

#OrientDB

How many of you have already used NoSQL technology?

How many of you are familiar with Graph Databases?

How many of you are already familiar with OrientDB?

Before We Start…

#OrientDB

“90% of the data in the world today has been created in the last two years alone.” 

­ IBM

#OrientDB

Order #134(Order)

Order #134(Order) John

(Provider)John

(Provider)

CBMAmiga 500 (Product)

CBMAmiga 500 (Product)

Frank(Customer)

Frank(Customer)

Monitor 40”

(Product)

Monitor 40”

(Product)

Mouse(Product)

Mouse(Product)

Bruno(Provider)Bruno

(Provider)

#OrientDB

Order #134(Order)

Order #134(Order) John

(Provider)John

(Provider)

CBMAmiga 500 (Product)

CBMAmiga 500 (Product)

Frank(Customer)

Frank(Customer)

Monitor 40”

(Product)

Monitor 40”

(Product)

Mouse(Product)

Mouse(Product)

Bruno(Provider)Bruno

(Provider)

Data by itself has little value, it’s the relationshipbetween data that gives it

incredible value

#OrientDB

CBMAmiga 500 (Product)

CBMAmiga 500 (Product)

(Sells)

Frank(Customer)

Frank(Customer)

(Has)

(Makes)

Monitor 40”

(Product)

Monitor 40”

(Product)

(Sells)(Has)

Mouse(Product)

Mouse(Product)

(Sells)

(Has)

Order #134(Order)

Order #134(Order) John

(Provider)John

(Provider)

Bruno(Provider)Bruno

(Provider)

#OrientDB

Key/Value Databases

Document Databases

Graph Databases

Column Databases

#OrientDB

Key/Value Databases

Document Databases Graph Databases

Column Databases

#OrientDB

Why do most NoSQL productsavoid

managing relationships?

#OrientDB

ID Name

10 John

11 John

24 Mike

28 Mike

CustomerID CityID

10 24

10 33

32 44

ID City

24 Milan

33 London

18 Paris

18 Madrid

44 Moscow

Customers CustomersCities Cities

#OrientDB

What’s wrongwith JOIN?

#OrientDB

ID Name

10 John

11 John

24 Mike

28 Mike

CustomerID CityID

10 24

10 33

32 44

ID City

24 Milan

33 London

18 Paris

18 Madrid

44 Moscow

Customers CustomersCities CitiesJoins are executed every timeyou cross relationships

Querying million of records joining 3­4 tables could 

generate billions of combinations

#OrientDB

This is why the databasequery performance

suffers as the databaseincreases in size

O(Log N)

#OrientDB

RDBMS performance on traversal

#OrientDB

Solution: Graph Database!

#OrientDB

Graph Theory crash course

#OrientDB

Basic Graph

Andrea Rome

#OrientDB

Vertices are directed

* https://github.com/tinkerpop/blueprints/wiki/Property-Graph-Model

Property Graph Model*

Romecountry: Italy

Andreacompany: OrientDB

Vertices and Edges can have properties

Visitedyear: 2016

#OrientDB

Andrea Rome

Visitedyear: 2012

An Edge connects only 2 vertices

Use multiple edges to represent 1-N and N-M relationships

Workedyear: 2016

1­N and N­M Relationships

#OrientDB

Congrats! This is your diploma in«Graph Theory»

#OrientDB

How does a true* Graph Database

manage relationships?

*a “Graph” layer on top of a DBMS doesn’t qualify as a true GraphDB

#OrientDB

AndreaRomeRome

Visitedyear: 2012

#13:55 #15:99

Each element in the Graph has own

immutable Record ID

Each element in the Graph has own

immutable Record ID

#22:11

(Edge)

(Vertex) (Vertex)

Each element in the Graph has own

immutable Record ID

Each element in the Graph has own

immutable Record ID

Each element in the Graph has own

immutable Record ID

Each element in the Graph has own

immutable Record ID

#OrientDB

Connections use persistent

pointers

Connections use persistent

pointers

AndreaRomeRome

Visitedon: 2012#13:55

#15:99

out = #22:11in = #22:11

#22:11

(Edge)

(Vertex)

(Vertex)

src = #13:55 dst = #15:99

#OrientDB

AndreaRomeRome

Visitedon: 2012#13:55

#15:99

out = #22:11in = #22:11

#22:11

(Edge)

(Vertex)

(Vertex)

src = #13:55 dst = #15:99

#OrientDB

AndreaRomeRome

Visitedon: 2012#13:55

#15:99

out = #22:11in = #22:11

#22:11

(Edge)

(Vertex)

(Vertex)

src = #13:55 dst = #15:99

#OrientDB

A Graph Database creates therelationship just once

(when the edge is created)

VS

RDBMS computes therelationship every timeyou query a database

#OrientDB

When you move from a RDBMSto a Graph Database you jump

from a O(log N) speed to a near O(1)

With a Graph Database, thetraversing time is

not affected by database size!

This is huge in the BigData age

#OrientDB

No costs to traverse relationships:

• Recommendation engines• Social Applications• Spatial Apps• Master Data Management• Information Clustering

John

Thriller

Comedy

Pulp Fiction

Mr Bean

TheaterB

TheaterA

Theater C

NYC

San Josè

Lives in

Likes

LikesHas

Has

Is

Is

Plays

Has

Plays

#OrientDB

So the Graph Model Is the only solution to efficiently

manage relationships

But what about data complexity?And data consistency?

And scaling?

#OrientDB

Rel

atio

nshi

ps C

ompl

exity

 >

Data Complexity >

Relational

Key Value

Column

Graph

Document

First Generation NoSQLFirst Generation NoSQL

#OrientDB

First Generation NoSQL: Polyglot PersistenceFirst Generation NoSQL: Polyglot Persistence

RDBMSRDBMS

Key/Value StoreKey/Value Store

DocumentDatabaseDocumentDatabase

GraphDatabase

GraphDatabaseApplicationApplication

ETL

      

      

#OrientDB

Key/Value StoreKey/Value Store

DocumentDatabaseDocumentDatabase

GraphDatabase

GraphDatabaseApplicationApplication

ETL

      

      

First Generation NoSQL: Polyglot PersistenceFirst Generation NoSQL: Polyglot Persistence

- No standard between NoSQL Products

- Multiple vendors = multiple skills

- ETL + synchronization code is expensive to write and maintain

- Performance and Reliability is hard to predict

RDBMSRDBMS

#OrientDB

2nd Generation NoSQLis

Multi-Model

2nd Generation NoSQL is

Multi-model

#OrientDB

What’s a Multi­Model DBMS?What’s a Multi­Model DBMS?

Graph

Document

Object

Key/Value

Multi-Model represents the intersection

of multiple models in just one product

Full-Text

Spatial

#OrientDB

What’s a Multi­Model DBMS?What’s a Multi­Model DBMS?

Graph

Document

Object

Key/Value

Full-Text

Spatial

- Just one product to learn and maintain- Just one vendor relationship to manage- No ETL, no synchronization required- Performance and Reliability is easy to test from the beginning

- Just one product to learn and maintain- Just one vendor relationship to manage- No ETL, no synchronization required- Performance and Reliability is easy to test from the beginning

Multi-Model represents the intersection

of multiple models in just one product

Confidential

Polyglot vs Multi­modelPolyglot (NoSQL 1.0) Multimodel (NoSQL 2.0)

Polyglot Persistence is a fancy term to mean that when storing data, it is best to use multiple data storage technologies, chosen based upon the way data is being used by individual applications or components

Multi-model databases are intended to offer the data modeling advantages of polyglot persistence without its disadvantages. complexity, in particular, is reduced. The first multi-model database was OrientDB.

https://en.wikipedia.org/wiki/Multi­model_databasehttp://www.jamesserra.com/archive/2015/07/what­is­polyglot­persistence/

ECOMMERCE

PRODUCT CATALOG

SHOPPING CART

RECOMMENDATION

ECOMMERCE

PRODUCT CATALOG

SHOPPING CART

RECOMMENDATION

TRANSACTIONAL TRANSACTIONA

LSEARCH

SEARCH

SPATIAL

SPATIAL

#OrientDB

`

{ ”@rid": “12:382”, ”@class": ”Customer", “name”: “Frank”, “surname” : “Raggio”, “phone” : “+39 33123212”, “details”: {

“city”:”London", “tags”:”millennial” }}

Frank

Order

Makes

General purpose solution:• JSON• Schema-less • Schema-full• Schema-hybrid• Nested documents• Rich indexing and querying• Developer friendly

#OrientDB - @ldellaquila

Second Generation NoSQL

Rela

tionsh

ip C

om

ple

xit

y >

Data Complexity >

Relational

Key Value

Column

Graph

Document

Multi-Model

#OrientDB

With a true Graph, Document and Object Oriented engine

#OrientDB

•Support for TinkerPop standard for Graph DB: Gremlin language and Blueprints API

•SQL + extensions for graphs•JDBC driver to connect any BI tool•HTTP/JSON support•Drivers in Java, Node.js, Python, 

PHP, .NET, Perl, C/C++ and more

API & Standards

#OrientDB - @ldellaquila

• OrientDB footprint is minimal and the embedded version can run with few MB of RAM

• OrientDB needs a Java Run Time

• When run distributed, OrientDB uses Hazelcast (Apache2 licensed) library embedded

Requirements and Dependencies

#OrientDB - @ldellaquila

• Basic HTTP authentication (+HTTPS/SSL)

• User/Role authentication system. One User can have multiple Roles

• Privileges are managed in Roles

• Roles can inherit from other Roles

• Record-level security: every record can contain the user/role can create/read/update/delete the record

• Auditing available in Enterprise Edition

Security

#OrientDB - @ldellaquila

• HTTPS/SSL

• Starting from OrientDB v2.2:- Support for Kerberos- Encryption at REST using AES and DES of the entire database or portions- PBKDF2 HASH algorithm with a 24-bit length Salt per user for a configurable number of iterations

Encryption

#OrientDB - @ldellaquila

• Full Backup and Restore

• Delta Backup (v2.2) Enterprise Edition and Restore is available

• Studio web tool

• Command line Console

Administration

#OrientDB - @ldellaquila

• Import/Export in JSON

• Import from SQL script

• OrientDB ETL tool (http://orientdb.com/docs/last/ETL-Introduction.html)

• Teleporter (v2.2)

Data Extraction and Loading

#OrientDB - @ldellaquila

• Multi-Master architecture

• Tunable consistency through the usage of a quorum, per database or single class (table)

• Synchronous and Asynchronous replication

• Zero config: if multicast is enabled the server is attached to the cluster

Scale out and HA

#OrientDB

Master Node

Master Node

Master Node

Master Node

CC

CC CC CC

CCCC

CC

Multi-master Replication

Atomic, Consistent, Isolated and Durable (ACID) multi­statement transactions

#OrientDB

Master Node

Master Node

Master Node

Master Node

CC

CC CC CC

CCCC

CC

Auto-Discovered

Node

Auto-Discovered

Node

#OrientDB

ArchitecturesArchitectures

#OrientDB

Udemy Getting Started Training is ★★★★★ and Free

http://www.orientechnologies.com/getting­started

OrientDB Enterprise is Free for Development

OrientDB Community is FREE for any purpose (APACHE 2 license)

#OrientDB

DEMO