odessapy2013 - graph databases and python

40
graphs databases and python Maksym Klymyshyn CTO @ GVMachines Inc. (zakaz.ua)

Upload: maksym-klymyshyn

Post on 08-Sep-2014

47.169 views

Category:

Technology


6 download

DESCRIPTION

Page 10 "Я из Одессы я просто бухаю." translation: I'm from Odessa I just drink. Meaning his drinking a lot of "Vodka" ^_^ (@tuc @hackernews) This is local meme - when someone asking question and you will look stupid in case you don't have answer.

TRANSCRIPT

Page 1: Odessapy2013 - Graph databases and Python

graphs databases!and

python

Maksym Klymyshyn CTO @ GVMachines Inc. (zakaz.ua)

Page 2: Odessapy2013 - Graph databases and Python

What’s inside?

‣ PostgreSQL

‣ Neo4j

‣ ArangoDB

Page 3: Odessapy2013 - Graph databases and Python

Python Frameworks

‣ Bulbflow

‣ py4neo

‣ NetworkX

‣ Arango-python

Page 4: Odessapy2013 - Graph databases and Python

Relational to Graph model crash course

“Switching from relational to the graph model”!

by Luca Garulli

http://goo.gl/z08qwk!!

http://www.slideshare.net/lvca/switching-from-relational-to-the-graph-model

Page 5: Odessapy2013 - Graph databases and Python

My motivation is quite simple:

Page 6: Odessapy2013 - Graph databases and Python

–Norbert Wiener

“The best material model of a cat is another, or preferably the same, cat.”

Page 7: Odessapy2013 - Graph databases and Python
Page 8: Odessapy2013 - Graph databases and Python

Old good Postgres

Page 9: Odessapy2013 - Graph databases and Python

create table nodes ( node integer primary key, name varchar(10) not null, feat1 char(1), feat2 char(1)) !create table edges ( a integer not null references nodes(node) on update cascade on delete cascade, b integer not null references nodes(node) on update cascade on delete cascade, primary key (a, b)); !create index a_idx ON edges(a); create index b_idx ON edges(b); !create unique index pair_unique_idx on edges (LEAST(a, b), GREATEST(a, b)); !; and no self-loops alter table edges add constraint no_self_loops_chk check (a <> b); !insert into nodes values (1, 'node1', 'x', 'y'); insert into nodes values (2, 'node2', 'x', 'w'); insert into nodes values (3, 'node3', 'x', 'w'); insert into nodes values (4, 'node4', 'z', 'w'); insert into nodes values (5, 'node5', 'x', 'y'); insert into nodes values (6, 'node6', 'x', 'z'); insert into nodes values (7, 'node7', 'x', 'y'); !insert into edges values (1, 3), (2, 1), (2, 4), (3, 4), (3, 5), (3, 6), (4, 7), (5, 1), (5, 6), (6, 1); !; directed graph select * from nodes n left join edges e on n.node = e.b where e.a = 2; !; undirected graph select * from nodes where node in (select case when a=1 then b else a end from edges where 1 in (a,b)); !

Page 10: Odessapy2013 - Graph databases and Python

Я из Одессы, я просто бухаю.

Page 11: Odessapy2013 - Graph databases and Python

Neo4j

Page 12: Odessapy2013 - Graph databases and Python

Most famous graph database.

• 1,333 mentions within repositories on Github • 1,140,000 results in Google • 26,868 tweets • Really nice Admin interface • Awesome help tips

Page 13: Odessapy2013 - Graph databases and Python

Py2Neo, Neomodel, neo4django, bulbflow

A lot of python libraries

Page 14: Odessapy2013 - Graph databases and Python

; Create a node1, node2 and ; relation RELATED between two nodes CREATE (node1 {name:"node1"}), (node2 {name: "node2"}), (node1)-[:RELATED]->(node2); !

Page 15: Odessapy2013 - Graph databases and Python
Page 16: Odessapy2013 - Graph databases and Python

neo4j is friendly and powerful. The only thing is a bit complex querying language – Cypher

Page 17: Odessapy2013 - Graph databases and Python

from py2neo import neo4j, node, rel !!graph_db = neo4j.GraphDatabaseService( "http://localhost:7474/db/data/") !die_hard = graph_db.create( node(name="Bruce Willis"), node(name="John McClane"), node(name="Alan Rickman"), node(name="Hans Gruber"), node(name="Nakatomi Plaza"), rel(0, "PLAYS", 1), rel(2, "PLAYS", 3), rel(1, "VISITS", 4), rel(3, "STEALS_FROM", 4), rel(1, "KILLS", 3))

py4neo nodes

Page 18: Odessapy2013 - Graph databases and Python

from py2neo import neo4j, node !graph_db = neo4j.GraphDatabaseService( "http://localhost:7474/db/data/") alice, bob, carol = node(name="Alice"), \ node(name="Bob"), \ node(name="Carol") abc = neo4j.Path( alice, "KNOWS", bob, "KNOWS", carol) abc.create(graph_db) abc.nodes # [node(**{'name': 'Alice'}), # node(**{‘name': ‘Bob'}), # node(**{‘name': 'Carol'})]

py4neo paths

Page 19: Odessapy2013 - Graph databases and Python

Alice KNOWS Bob KNOWS Carol

Page 20: Odessapy2013 - Graph databases and Python

from bulbs.neo4jserver import Graph g = Graph() james = g.vertices.create(name="James") julie = g.vertices.create(name="Julie") g.edges.create(james, "knows", julie)

bulbflow framework

Page 21: Odessapy2013 - Graph databases and Python

FlockDB OrientDB InfoGrid

HyperGraphDB

WAT?

Page 22: Odessapy2013 - Graph databases and Python

ArangoDB

Page 23: Odessapy2013 - Graph databases and Python

–Michael Jordan

“In any investment, you expect to have fun and make profit.”

Page 24: Odessapy2013 - Graph databases and Python

I’m developer of python driver for ArangoDB

Page 25: Odessapy2013 - Graph databases and Python

• NoSQL Database storage

• Graph of documents

• AQL (arango query language) to execute graph queries

• Edge data type to create edges between nodes (with properties)

• Multiple edges collections to keep different kind of edges

• Support of Gremlin graph query language

Page 26: Odessapy2013 - Graph databases and Python

Small experiment with graphs and twitter:!

I’ve looked on my tweets and people who added it to favorites.

After that I’ve looked to that person’s tweets and did the same thing with people who favorited their tweets.

Page 27: Odessapy2013 - Graph databases and Python

1-level depth

Page 28: Odessapy2013 - Graph databases and Python

2-level depth

Page 29: Odessapy2013 - Graph databases and Python

3-level depth

Page 30: Odessapy2013 - Graph databases and Python

Code behind

from arango import create !arango = create(db="tweets_maxmaxmaxmax") arango.database.create() arango.tweets.create() arango.tweets_edges.create( type=arango.COLLECTION_EDGES) !

Page 31: Odessapy2013 - Graph databases and Python

!from_doc = arango.tweets.documents.create({}) to_doc = arango.tweets.documents.create({}) arango.tweets_edges.edges.create(from_doc, to_doc)

query = db.tweets_edge.query.over( F.EDGES( "tweets_edges", ~V("tweets/196297127"), ~V("outbound")))

Here we creating edge from from_doc to to_doc

Getting edges for tweet 196297127

Page 32: Odessapy2013 - Graph databases and Python

Full example

• Sample dataset with 10 users • Relations between users • Visualise within admin interface

Page 33: Odessapy2013 - Graph databases and Python

Sample dataset

from arango import create !def dataset(a): a.database.create() a.users.create() a.knows.create(type=a.COLLECTION_EDGES) ! for u in range(10): a.users.documents.create({ "name": "user_{}".format(u), "age": u + 20, "gender": u % 2 == 0}) !!a = create(db="experiments") dataset(a)

Page 34: Odessapy2013 - Graph databases and Python

Relations between users

def relations(a): rels = ( (0, 1), (0, 2), (2, 3), (4, 3), (3, 5), (5, 1), (0, 5), (5, 6), (6, 7), (7, 8), (9, 8)) ! get_user = lambda id: a.users.query.filter( "obj.name == 'user_{}'".format(id)).execute().first ! for f, t in rels: what = "user_{} knows user_{}".format(f, t) from_doc, to_doc = get_user(f), get_user(t) a.knows.edges.create(from_doc, to_doc, {"what": what}) print ("{}->{}: {}".format(from_doc.id, to_doc.id, what)) !a = create(db="experiments") relations(a)

Page 35: Odessapy2013 - Graph databases and Python

Relations between users

users/2744664487->users/2744926631: user_0 knows user_1 users/2744664487->users/2745123239: user_0 knows user_2 users/2745123239->users/2745319847: user_2 knows user_3 users/2745516455->users/2745319847: user_4 knows user_3 users/2745319847->users/2745713063: user_3 knows user_5 users/2745713063->users/2744926631: user_5 knows user_1 users/2744664487->users/2745713063: user_0 knows user_5 users/2745713063->users/2745909671: user_5 knows user_6 users/2745909671->users/2746106279: user_6 knows user_7 users/2746106279->users/2746302887: user_7 knows user_8 users/2746499495->users/2746302887: user_9 knows user_8

Page 36: Odessapy2013 - Graph databases and Python
Page 37: Odessapy2013 - Graph databases and Python

AQL, getting pathsFOR p IN PATHS(users, knows, 'outbound') FILTER p.source.name == 'user_5' RETURN p.vertices[*].name

from arango import create from arango.aql import F, V !!def querying(a): for data in a.knows.query.over( F.PATHS("users", "knows", ~V("outbound")))\ .filter("obj.source.name == '{}'".format("user_5"))\ .result("obj.vertices[*].name")\ .execute(wrapper=lambda c, i: i): print (data) !!a = create(db="experiments") !querying(a)

Page 38: Odessapy2013 - Graph databases and Python

Paths output

['user_5'] ['user_5', 'user_1'] ['user_5', 'user_6'] ['user_5', 'user_6', 'user_7'] ['user_5', 'user_6', 'user_7', 'user_8']

Page 39: Odessapy2013 - Graph databases and Python

Links

• Arango paths: http://goo.gl/n2L3SK • Neo4j: http://goo.gl/au5y9I • Scraper: http://goo.gl/nvMFGk!• Visualiser: http://goo.gl/Rzdwci

Page 40: Odessapy2013 - Graph databases and Python

Thanks. Q’s? !

@maxmaxmaxmax