big data document and graph d bs - couch-db and orientdb

58
Big Data NoSQL Database Types: episode II

Upload: jworks-powered-by-ordina

Post on 22-Jan-2017

328 views

Category:

Software


2 download

TRANSCRIPT

Page 1: Big data  document and graph d bs - couch-db and orientdb

Big Data

NoSQL Database Types: episode II

Page 2: Big data  document and graph d bs - couch-db and orientdb

Content

▪ Document Store▪ Graph DB

Page 3: Big data  document and graph d bs - couch-db and orientdb

Graph

Page 4: Big data  document and graph d bs - couch-db and orientdb

Graph DB

▪ Why Graph DB▪ OrientDB▪ OrientDB vs Neo4J

Page 5: Big data  document and graph d bs - couch-db and orientdb

Graph DB: Why

Long time around

In some form

Page 6: Big data  document and graph d bs - couch-db and orientdb

Graph DB: Why

Page 7: Big data  document and graph d bs - couch-db and orientdb

Graph DB: Why

Can it handle complexity?

▪ Key/Value ▪ Column Store▪ Document Store

can not handle relations

▪ Graph Database !

Page 8: Big data  document and graph d bs - couch-db and orientdb

Graph DB: Why

Page 9: Big data  document and graph d bs - couch-db and orientdb

Graph DB: RDBMS relations

Customer Address

Page 10: Big data  document and graph d bs - couch-db and orientdb

Graph DB: 1 to 1

Customer Address

id address

2 Antwerp

4 Brussels

5 Essen

id name address_id

1 Tom VdB 5

2 Tom C. 4

3 Andriy 2

Page 11: Big data  document and graph d bs - couch-db and orientdb

Graph DB: 1 to N

Customer Address

id address

1 Tom

2 Andriy

3 Jos

id customer location

1 3 Antwerp

2 3 Brussels

3 1 Rome

Page 12: Big data  document and graph d bs - couch-db and orientdb

Graph DB: N to M

Customer CustomerAddress

id address

1 Tom

2 Andriy

3 Jos

customer address

3 1

3 5

2 1

Address

id location

1 Antwerp

5 Brussels

Page 13: Big data  document and graph d bs - couch-db and orientdb

Graph DB: what is wrong

Page 14: Big data  document and graph d bs - couch-db and orientdb

Graph DB: The join

Customer CustomerAddress

id address

1 Tom

2 Andriy

3 Jos

customer address

3 1

3 5

2 1

Address

id location

1 Antwerp

5 Brussels

These joins are all executed everytime you traverse the relationship

Page 15: Big data  document and graph d bs - couch-db and orientdb

Graph DB: what is wrong

Chris De Bruyne
Ik heb de film gezien maar snap niet wat deze reeks van fotos te maken hebben met het onderwerp op de slides? Misschien moet ik nog eens kijken?
Tom Van den Bulck
eerder ter begeleiding van het verhaal in de meeting minutes: 1. de vraag2. de queste - zoektocht3. de oplossing - indexes (zwart ridderke)4 maar die laat zich niet zo gemakkelijk aanpassen
Page 16: Big data  document and graph d bs - couch-db and orientdb

Graph DB: what is wrong

A join means searching for a key in another table

In order to improve performance one adds indexing

But that slows down inserts, updates and deletes

Page 17: Big data  document and graph d bs - couch-db and orientdb

Graph DB: index lookup

A-Z

A-L M-Z

A-L

A-D E-L

M-Z

M-R S-Z

A-D

A-B C-D

E-L

E-G H-L

E-G

E-F G

H-L

H-J K-L

Jos

Jos

Page 18: Big data  document and graph d bs - couch-db and orientdb

Graph DB: index lookup

Now

Imagine

billions of records

Page 19: Big data  document and graph d bs - couch-db and orientdb

Graph DB: index lookup

This join is executed for every involved table multiplied for all scanned records

Page 20: Big data  document and graph d bs - couch-db and orientdb

Graph DB: What about document databases

{ “_id”: 1, “name”: Tom, “address_id”: 4}

Page 21: Big data  document and graph d bs - couch-db and orientdb

Graph DB: Is there a better way

“A graph database is any storage system that provides index-free adjacency “

Marko Rodriguez

“auther of Tinkerpop Blueprints”

Page 22: Big data  document and graph d bs - couch-db and orientdb

Graph DB: Is there a better way

Index free relationshops ?

Page 23: Big data  document and graph d bs - couch-db and orientdb

Graph DB: Back to school

Page 24: Big data  document and graph d bs - couch-db and orientdb

Graph DB: Back to School

Tom Essenlives in

I am a Vertex We are vertices

An Edge

Page 25: Big data  document and graph d bs - couch-db and orientdb

Graph DB: Back to School

Tomfirstname: TomSurname: VdB

Company: Ordina

Essenpopulation:

17000

lives in

since: 1982

Page 26: Big data  document and graph d bs - couch-db and orientdb

Graph DB: Back to School

1 to N relationships

TomEssen

lives insince: 1982

Walked in:when: 1990, 1992

Page 27: Big data  document and graph d bs - couch-db and orientdb

Graph DB: Back to School

Graph Example

Tom

Ordina

isMemberOf

Works For

meetup: bigdata.be

Hosted By

VisitedOffice

Page 28: Big data  document and graph d bs - couch-db and orientdb

Graph DB: Back to school

Congratulations - you are now graduated in graph theory

Page 29: Big data  document and graph d bs - couch-db and orientdb

GraphDB: Index Lookup vs Relations

Page 30: Big data  document and graph d bs - couch-db and orientdb

GraphDB: Index Lookup vs Relations

Page 31: Big data  document and graph d bs - couch-db and orientdb

Graph DB: OrientDB

▪ How does OrientDB manage relationships▪ Some Limits▪ Hybrid▪ Transactions and ACID▪ Create the Graph▪ Query vs Traversal▪ Schema

Page 32: Big data  document and graph d bs - couch-db and orientdb

OrientDB: Manage Relationships

Tom(Vertex)

Essen(Vertex)

Rid: #13.35 Rid: #13.100

Label: “customer”Name: Tom

Label: “city”Name: Essen

Page 33: Big data  document and graph d bs - couch-db and orientdb

OrientDB: Manage Relationships

Tom(Vertex)

Essen(Vertex)

Rid: #13.35 Rid: #13.100

Label: “customer”Name: Tomout: #14.3

Label: “city”Name: Essenin: #14.3

Lives in

Rid: #14.3

Label: “Lives in”In: #13.35Out: #13.100

Page 34: Big data  document and graph d bs - couch-db and orientdb

OrientDB: Some Limits

Databases

Clusters

Records per cluster (Edges, Vertices and Documents)

Records per database

Record Size

Document Properties

Chris De Bruyne
Clusters : multi-master, masterless, master-slave ? Ik zou hierover toch ff uitwijden want de verschillende mogelijkheden zijn
Chris De Bruyne
Wat is een database? de fysieke server of de OrientDb instantie? Waarom zou je 1000 Databases gebruiken (vandaar ook, wat is een database)?
Tom Van den Bulck
database is gelijk ... een database op een oracleop een orientdb instantie kan je net zoals bij een oracle verschillende db's aanmaken - met elk hun eigen schema
Page 35: Big data  document and graph d bs - couch-db and orientdb

OrientDB: Some Limits

Indexes

Queries

Concurrency Level

Chris De Bruyne
exclusive lock op wat? DB, "table", node?
Tom Van den Bulck
eerder dat transacties 1 voor 1 binnenkomen en verwerkt wordenorientdb gebruikt optimistic locking: dus bij write gaat die zien of situatie nu gelijk is aan de situatie die jij had voor dat je je transactie startte
Page 36: Big data  document and graph d bs - couch-db and orientdb

OrientDB: Class - Records - Cluster

Page 37: Big data  document and graph d bs - couch-db and orientdb

OrientDB: Hybrid Model

Page 38: Big data  document and graph d bs - couch-db and orientdb

OrientDB: Transactions and ACID

Page 39: Big data  document and graph d bs - couch-db and orientdb

OrientDB: Transactions and ACID

Page 40: Big data  document and graph d bs - couch-db and orientdb

OrientDB: Transactions and ACID

Page 41: Big data  document and graph d bs - couch-db and orientdb

OrientDB: Create the Graph - SQL

Page 42: Big data  document and graph d bs - couch-db and orientdb

OrientDB: Create the Graph - Java

Chris De Bruyne
Spring-Data voorbeeldje? Of was er enkel ondersteuning voor Neo4J?
Tom Van den Bulck
gewoon simpel voorbeeldje in java - spring data ga ik niet direct gebruiken in workshops. Liever mensen eerst iets uitleggen met de drivers aangeleverd door de makersOrientdb heeft wel een spring data implementatie gemaakt: https://github.com/orientechnologies/spring-data-orientdb
Page 43: Big data  document and graph d bs - couch-db and orientdb

OrientDB: Query vs Traversal

Order 1

Order 2

Order 3

Calendar

Year 2014

Month 12/2014

Day: 1 dec 2014

Day: 6 dec 2014

Special Order Orders

Page 44: Big data  document and graph d bs - couch-db and orientdb

OrientDB: Schema

▪ schema full

▪ schema-mixed

▪ schema-less

Page 45: Big data  document and graph d bs - couch-db and orientdb

OrientDB: Schema Design

Jos

Tom

André

Sends Email to

Sends Email to

Page 46: Big data  document and graph d bs - couch-db and orientdb

OrientDB: Schema Design

Jos

Tom

André

Emailsends

TO

CC

Page 47: Big data  document and graph d bs - couch-db and orientdb

OrientDB: Gremlin

Page 48: Big data  document and graph d bs - couch-db and orientdb

OrientDB: Gremlin

Pipeline of steps▪ transform▪ filter▪ sideEffect▪ branch

Page 49: Big data  document and graph d bs - couch-db and orientdb

OrientDB: Gremlin

Page 50: Big data  document and graph d bs - couch-db and orientdb

OrientDB: Gremlin

Chris De Bruyne
Cooool,
Page 51: Big data  document and graph d bs - couch-db and orientdb

OrientDB: Gremlin

Chris De Bruyne
wat is n1, n2, n3 ?
Tom Van den Bulck
top 3 van graph db's
Page 52: Big data  document and graph d bs - couch-db and orientdb

Graphdb: Use Cases

▪ Recommendation engines

▪ Ranking/Credibility

▪ Path Finding (shortest, longest, mutual friends)

▪ Social (friendship, following, key connectors)

Page 53: Big data  document and graph d bs - couch-db and orientdb

Some code to play with

1. Go to https://github.com/tomvdbulck/orientdb_initiation

2. Make sure the following items have been installed on your machine:

o Java 7 or higher

o Git (if you like a pretty interface to deal with git, try SourceTree)

o Maven

3. Install VirtualBox https://www.virtualbox.org/wiki/Downloads

4. Install Vagrant https://www.vagrantup.com/downloads.html

5. Clone the repository into your workspace

6. Open a command prompt, go to the vagrant folder and run

vagrant up

7. This will start up the vagrant box. The first time will take a while (approx. 5 min) as it has to

download the OS image and other dependencies.

Page 54: Big data  document and graph d bs - couch-db and orientdb

Want More?

Page 55: Big data  document and graph d bs - couch-db and orientdb

Even More?

Upcoming meetup on 17/06 - @ Ordina

1st meetup of Spark Belgum http://www.meetup.com/Spark-Belgium/events/222632697/

Page 56: Big data  document and graph d bs - couch-db and orientdb

Want More?

Upcoming meetup hosted @ordina on wednesday 24/06 - Neo4j

http://www.meetup.com/graphdb-belgium/events/222504421

Page 57: Big data  document and graph d bs - couch-db and orientdb

Even More?

Upcoming workshop on 2/7 - @ Ordina

Introduction to Hadoop and it’s zoo

Page 58: Big data  document and graph d bs - couch-db and orientdb

Questions or Suggestions?