orientdb - the 2nd generation of (multi-model) nosql
TRANSCRIPT
Luigi Dell’Aquila
Director of Consulting
Orient Technologies LTD
Twitter: @ldellaquila
http://www.orientdb.com
OrientDB - the 2nd generation of
(Multi-Model) NoSQLAnd why GraphDB are the starting point of this revolution
“90% of the data
in the world today
has been created
in the last two years alone.”
- IBM
Welcome to Big Data
Just Data
Order #134(Order) John
(Provider)
Commodore
Amiga 1200
(Product)
Frank(Customer)
Monitor 40”
(Product)
Mouse
(Product)
Bruno(Provider)
Just Data
Order #134(Order) John
(Provider)
Commodore
Amiga 1200
(Product)
Frank(Customer)
Monitor 40”
(Product)
Mouse
(Product)
Bruno(Provider)
Data by itself has little value,
it’s the relationship
between data that gives it
incredible value
Relationships give data “meaning”
Order #134(Order) John
(Provider)
Commodore
Amiga 1200
(Product)
(Sells)
Frank(Customer)
(Has)(Makes)
Monitor 40”
(Product)(Sells)
(Has)
Mouse
(Product)
Bruno(Provider) (Sells)
(Has)
Top NoSQL categories
Key/Value Databases
Document Databases
Graph Databases
Column Databases
Top NoSQL categories
Key/Value Databases
Document Databases Graph Databases
Column Databases
Why do most NoSQL products
avoid
managing relationships?
ID Name
10 John
11 John
24 Mike
28 Mike
ID Address
10 24
10 33
32 44
ID Location
24 Milan
33 London
18 Paris
18 Madrid
44 Moscow
Customer CustomerAddress Address
Is this
familiar?
What’s wrongwith JOIN?
A-Z
A-L M-Z
Imagine an Address Book
where we want to find Luigi’s phone number
Index Lookup: how does it work?
A-Z
A-L M-Z
A-L
A-D E-L
M-Z
M-R S-Z
Index algorithms are all similar and based on
balanced trees
Index Lookup: how does it work?
A-Z
A-L M-Z
A-L
A-D E-L
M-Z
M-R S-Z
A-D
A-B C-D
E-L
E-G H-L
Index Lookup: how does it work?
A-Z
A-L M-Z
A-L
A-D E-L
M-Z
M-R S-Z
A-D
A-B C-D
E-L
E-G H-L
E-G
E-F G
H-L
H-J K-L
Index Lookup: how does it work?
Index Lookup: how does it work?
A-Z
A-L M-Z
A-L
A-D E-L
M-Z
M-R S-Z
A-D
A-B C-D
E-L
E-G H-L
E-G
E-F G
H-L
H-J K-L
Luigi
Found! This lookup took 5 steps. With millions of indexed records, the tree depth
could be 1000’s of levels!
Joins Kill Performance
ID Name
10 John
11 John
24 Mike
28 Mike
ID Address
10 24
10 33
32 44
ID Location
24 Milan
33 London
18 Paris
18 Madrid
44 Moscow
Customer CustomerAddress Address
Joins are executed every time
you cross relationships
Querying million of records
joining 3-4 tables could
generate billions of
combinations
This is why the database
query performance
suffers as the database
increases in size
O(Log N)
RDBMS performance on traversal
In a world that’s becoming
more connected, we need a
better way to store data and
manage relationships
Read: Data is important, but relationships are even more fundamental today
“A graph database is any
storage system
that provides
index-free adjacency”
- Marko Rodriguez(author of TinkerPop Blueprints)
Every developer knows
the Relational Model,
but who knows the
Graph one?
Back to school:
Graph Theory crash course
Basic Graph
Luigi LyonVisited
Vertices and Edges can have properties
Vertices are directed
* https://github.com/tinkerpop/blueprints/wiki/Property-Graph-Model
Property Graph Model*
Lyon
people: 500,000
Luigi
company: OrientTechnologies
Vertices and Edges can have properties
Vertices and Edges can have properties
Visited
on: 2015
Luigi Lyon
An Edge connects only 2 vertices
Use multiple edges to represent 1-N and N-M relationships
1-N and N-M Relationships
Congrats! This is your diploma in
«Graph Theory»
The Graph theory
is so simple,
yet so
powerful
How does a true*
Graph Database
manage relationships?
*a “Graph” layer on top of a DBMS doesn’t qualify as a true GraphDB
Luigi Lyon#13:55
#15:99
Each element in the
Graph has own
immutable Record ID
#22:11
(Edge)
(Vertex)(Vertex)
Each element in the
Graph has own
immutable Record ID
Each element in the
Graph has own
immutable Record ID
Luigi Lyon#13:55
#15:99
Connections use
persistent
pointers
#22:11
(Edge)
(Vertex)(Vertex)
Luigi Lyon#13:55
#15:99
#22:11
(Edge)
(Vertex)(Vertex)
Luigi Lyon#13:55
#15:99
#22:11
(Edge)
(Vertex)(Vertex)
A Graph Database creates the
relationship just once
(when the edge is created)
VS
RDBMS computes the
relationship every time
you query a database
When you move from a RDBMS
to a Graph Database you jump
from a O(log N) speed to a near O(1)
With a Graph Database, the
traversing time is
not affected by database size!
This is huge in the BigData age
Graph Databases Easily Manage Complex Relationships
No costs to traverse relationships:
• Recommendation engines
• Social Applications
• Spatial Apps
• Master Data Management
• Information Clustering
John
Thriller
Comedy
Pulp Fiction
Mr Bean
TheaterB
TheaterA
Theater C
NYC
San Josè
Lives in
GraphDB Database QuadrantR
ela
tionship
s C
om
ple
xity >
Data Complexity >
Relational
Key Value
Column
Graph
Document
GraphDB Database QuadrantR
ela
tionship
s C
om
ple
xity >
Data Complexity >
Relational
Key Value
Column
Graph
Document
These were 1st generation NoSQL
products, where each tool was
only good at a few use cases
Oracle
(RDBMS)
Redis or
Memcache
(Key/Value)
MongoDB
(DocDB)
Neo4j
(GraphDB)
E
Application
ETL
E
E
E
1st Generation NoSQL: Scenario
Primary
DB
1st Generation NoSQL: Fact
In > 90% of use cases,
NoSQL products are
used as second DBMS
Oracle
(RDBMS)
Redis or
Memcache
(Key/Value)
MongoDB
(DocDB)
Neo4j
(GraphDB)
E
Application
ETL
E
E
E
1st Generation NoSQL: Problems
- No standard between NoSQL
products
- Multiple vendors = multiple skills
- ETL + synchronization code
is costly to write and maintain
- Performance and Reliability is
hard to predict
2nd Generation NoSQL
is
Multi-Model
What’s Multi-Model DBMS?
GraphDocument
Object
Key/Value
Multi Model represents the
intersection
of multiple models in just one
product
What’s Multi-Model DBMS?
GraphDocument
Object
Key/Value
Multi Model represents the
intersection
of multiple models in just one
product
- Just one product to learn and maintain
- Just one vendor relationship to manage
- No ETL, no synchronization required
- Performance and Reliability is easy to test from the
beginning
Relationships give data “meaning”
Order #134(Order) John
(Provider)
Commodore
Amiga 1200
(Product)
(Sells)
Frank(Customer)
(Has)(Makes)
Monitor 40”
(Product)(Sells)
(Has)
Mouse
(Product)
Bruno(Provider)
(Sells)
(Has)
Multi-Model domain schema
Customer Provider
Productname: string
qty: int
Actorname: string
surname: string
Sellsprice: decimal
Inherits
Edge
Legenda:
V Vertex
Makes
Ordernumber: int
date: datetime
Hasprice: decimal
`
Vertices and Edges are Documents
{
”@rid": “12:382”,
”@class": ”Customer",
“name”: “Frank”,
“surname” : “Raggio”,
“phone” : “+39 33123212”,
“details”: {
“city”:”London",
“tags”:”millennial”
}
}
Frank
Order
General purpose solution:
• JSON
• Schema-less
• Schema-full
• Schema-hybrid
• Nested documents
• Rich indexing and
querying
• Developer friendly
Polymorphic queries
John(Provider)
Frank(Customer)SELECT * FROM Customer
SELECT * FROM Provider
SELECT * FROM Actor
Bruno(Provider)
Bruno(Provider)
Frank(Customer)
John(Provider)
Multi-Model complex domains schema
Band Genre
AccountMusicTaste
Location
Likes
Performs
Inherits
Edge
Legenda:
V Vertex
Plays
Multi-Model complex domains
Snow Patrol(Band)
John(Account)
Indie(Genre)
123, 1st Street
Austin, TX
(Location)
(Performs)
April 7, 2015
9pm-11.30pm
(Likes)
Frank(Account)
(Likes)
(Likes)
Rock(Genre)(Likes)
(Plays)
Multi-Model Database QuadrantR
ela
tionship
s C
om
ple
xity >
Data Complexity >
Relational
Key Value
Column
Graph Multi-Model
Document
Multi-Model Solutions
There are a few DBMSs that claim
to be Multi-Model, but they do not
have a true Graph Engine.
The “Graph” is only a layer on top
of the engine.
Under the hood they do JOINs,
which means traversal time is
affected by database size.
Meet OrientDB
The First Ever Multi-Model
Database Combining Flexibility
of Documents with
Connectedness of Graphs
With a true Graph, Document,
Key/Value and Object Oriented engine
OrientDB features
DEMO
• Support for TinkerPop standard
for Graph DB: Gremlin language
and Blueprints API
• SQL + extensions for graphs
• JDBC driver to connect any BI tool
• HTTP/JSON support
• Drivers in Java, Node.js, Python,
PHP, .NET, Perl, C/C++ and more
API & Standards
Availability and Integrity
• Atomic, Consistent, Isolated and Durable (ACID)
multi-statement transactions
Master
Node
Master
Node
C
C C C
CC
C
Multi-master
Replication
Scalability and Performance
• Multi-Master Replication, Sharding and Auto-
Discovery to Simplify Ops
• +200k Tps on Commodity Hardware
Master
Node
Master
Node
C
C C C
CC
C
Auto-
Discovered
Node
Some numbers
A Bright Future
Graph DBMS increased their popularity by 500% within the last 2 years
Document DBMS are the 3rd fastest growing category
Some of Our Customers
Get Started for Free
OrientDB Community Edition is FREE
for any purpose (Apache 2 license)
Udemy Getting Started Training is
★★★★★ and Free
http://www.orientechnologies.com/getting-started
OrientDB Enterprise is Free for
Development