big data document and graph d bs - couch-db and orientdb
TRANSCRIPT
Big Data
NoSQL Database Types: episode II
Content
▪ Document Store▪ Graph DB
Graph
Graph DB
▪ Why Graph DB▪ OrientDB▪ OrientDB vs Neo4J
Graph DB: Why
Long time around
In some form
Graph DB: Why
Graph DB: Why
Can it handle complexity?
▪ Key/Value ▪ Column Store▪ Document Store
can not handle relations
▪ Graph Database !
Graph DB: Why
Graph DB: RDBMS relations
Customer Address
Graph DB: 1 to 1
Customer Address
id address
2 Antwerp
4 Brussels
5 Essen
id name address_id
1 Tom VdB 5
2 Tom C. 4
3 Andriy 2
Graph DB: 1 to N
Customer Address
id address
1 Tom
2 Andriy
3 Jos
id customer location
1 3 Antwerp
2 3 Brussels
3 1 Rome
Graph DB: N to M
Customer CustomerAddress
id address
1 Tom
2 Andriy
3 Jos
customer address
3 1
3 5
2 1
Address
id location
1 Antwerp
5 Brussels
Graph DB: what is wrong
Graph DB: The join
Customer CustomerAddress
id address
1 Tom
2 Andriy
3 Jos
customer address
3 1
3 5
2 1
Address
id location
1 Antwerp
5 Brussels
These joins are all executed everytime you traverse the relationship
Graph DB: what is wrong
Graph DB: what is wrong
A join means searching for a key in another table
In order to improve performance one adds indexing
But that slows down inserts, updates and deletes
Graph DB: index lookup
A-Z
A-L M-Z
A-L
A-D E-L
M-Z
M-R S-Z
A-D
A-B C-D
E-L
E-G H-L
E-G
E-F G
H-L
H-J K-L
Jos
Jos
Graph DB: index lookup
Now
Imagine
billions of records
Graph DB: index lookup
This join is executed for every involved table multiplied for all scanned records
Graph DB: What about document databases
{ “_id”: 1, “name”: Tom, “address_id”: 4}
Graph DB: Is there a better way
“A graph database is any storage system that provides index-free adjacency “
Marko Rodriguez
“auther of Tinkerpop Blueprints”
Graph DB: Is there a better way
Index free relationshops ?
Graph DB: Back to school
Graph DB: Back to School
Tom Essenlives in
I am a Vertex We are vertices
An Edge
Graph DB: Back to School
Tomfirstname: TomSurname: VdB
Company: Ordina
Essenpopulation:
17000
lives in
since: 1982
Graph DB: Back to School
1 to N relationships
TomEssen
lives insince: 1982
Walked in:when: 1990, 1992
Graph DB: Back to School
Graph Example
Tom
Ordina
isMemberOf
Works For
meetup: bigdata.be
Hosted By
VisitedOffice
Graph DB: Back to school
Congratulations - you are now graduated in graph theory
GraphDB: Index Lookup vs Relations
GraphDB: Index Lookup vs Relations
Graph DB: OrientDB
▪ How does OrientDB manage relationships▪ Some Limits▪ Hybrid▪ Transactions and ACID▪ Create the Graph▪ Query vs Traversal▪ Schema
OrientDB: Manage Relationships
Tom(Vertex)
Essen(Vertex)
Rid: #13.35 Rid: #13.100
Label: “customer”Name: Tom
Label: “city”Name: Essen
OrientDB: Manage Relationships
Tom(Vertex)
Essen(Vertex)
Rid: #13.35 Rid: #13.100
Label: “customer”Name: Tomout: #14.3
Label: “city”Name: Essenin: #14.3
Lives in
Rid: #14.3
Label: “Lives in”In: #13.35Out: #13.100
OrientDB: Some Limits
Databases
Clusters
Records per cluster (Edges, Vertices and Documents)
Records per database
Record Size
Document Properties
OrientDB: Some Limits
Indexes
Queries
Concurrency Level
OrientDB: Class - Records - Cluster
OrientDB: Hybrid Model
OrientDB: Transactions and ACID
OrientDB: Transactions and ACID
OrientDB: Transactions and ACID
OrientDB: Create the Graph - SQL
OrientDB: Create the Graph - Java
OrientDB: Query vs Traversal
Order 1
Order 2
Order 3
Calendar
Year 2014
Month 12/2014
Day: 1 dec 2014
Day: 6 dec 2014
Special Order Orders
OrientDB: Schema
▪ schema full
▪ schema-mixed
▪ schema-less
OrientDB: Schema Design
Jos
Tom
André
Sends Email to
Sends Email to
OrientDB: Schema Design
Jos
Tom
André
Emailsends
TO
CC
OrientDB: Gremlin
OrientDB: Gremlin
Pipeline of steps▪ transform▪ filter▪ sideEffect▪ branch
OrientDB: Gremlin
OrientDB: Gremlin
OrientDB: Gremlin
Graphdb: Use Cases
▪ Recommendation engines
▪ Ranking/Credibility
▪ Path Finding (shortest, longest, mutual friends)
▪ Social (friendship, following, key connectors)
Some code to play with
1. Go to https://github.com/tomvdbulck/orientdb_initiation
2. Make sure the following items have been installed on your machine:
o Java 7 or higher
o Git (if you like a pretty interface to deal with git, try SourceTree)
o Maven
3. Install VirtualBox https://www.virtualbox.org/wiki/Downloads
4. Install Vagrant https://www.vagrantup.com/downloads.html
5. Clone the repository into your workspace
6. Open a command prompt, go to the vagrant folder and run
vagrant up
7. This will start up the vagrant box. The first time will take a while (approx. 5 min) as it has to
download the OS image and other dependencies.
Want More?
Even More?
Upcoming meetup on 17/06 - @ Ordina
1st meetup of Spark Belgum http://www.meetup.com/Spark-Belgium/events/222632697/
Want More?
Upcoming meetup hosted @ordina on wednesday 24/06 - Neo4j
http://www.meetup.com/graphdb-belgium/events/222504421
Even More?
Upcoming workshop on 2/7 - @ Ordina
Introduction to Hadoop and it’s zoo
Questions or Suggestions?