odessapy2013 - graph databases and python

Post on 08-Sep-2014

47.169 Views

Category:

Technology

6 Downloads

Preview:

Click to see full reader

DESCRIPTION

Page 10 "Я из Одессы я просто бухаю." translation: I'm from Odessa I just drink. Meaning his drinking a lot of "Vodka" ^_^ (@tuc @hackernews) This is local meme - when someone asking question and you will look stupid in case you don't have answer.

TRANSCRIPT

graphs databases!and

python

Maksym Klymyshyn CTO @ GVMachines Inc. (zakaz.ua)

What’s inside?

‣ PostgreSQL

‣ Neo4j

‣ ArangoDB

Python Frameworks

‣ Bulbflow

‣ py4neo

‣ NetworkX

‣ Arango-python

Relational to Graph model crash course

“Switching from relational to the graph model”!

by Luca Garulli

http://goo.gl/z08qwk!!

http://www.slideshare.net/lvca/switching-from-relational-to-the-graph-model

My motivation is quite simple:

–Norbert Wiener

“The best material model of a cat is another, or preferably the same, cat.”

Old good Postgres

create table nodes ( node integer primary key, name varchar(10) not null, feat1 char(1), feat2 char(1)) !create table edges ( a integer not null references nodes(node) on update cascade on delete cascade, b integer not null references nodes(node) on update cascade on delete cascade, primary key (a, b)); !create index a_idx ON edges(a); create index b_idx ON edges(b); !create unique index pair_unique_idx on edges (LEAST(a, b), GREATEST(a, b)); !; and no self-loops alter table edges add constraint no_self_loops_chk check (a <> b); !insert into nodes values (1, 'node1', 'x', 'y'); insert into nodes values (2, 'node2', 'x', 'w'); insert into nodes values (3, 'node3', 'x', 'w'); insert into nodes values (4, 'node4', 'z', 'w'); insert into nodes values (5, 'node5', 'x', 'y'); insert into nodes values (6, 'node6', 'x', 'z'); insert into nodes values (7, 'node7', 'x', 'y'); !insert into edges values (1, 3), (2, 1), (2, 4), (3, 4), (3, 5), (3, 6), (4, 7), (5, 1), (5, 6), (6, 1); !; directed graph select * from nodes n left join edges e on n.node = e.b where e.a = 2; !; undirected graph select * from nodes where node in (select case when a=1 then b else a end from edges where 1 in (a,b)); !

Я из Одессы, я просто бухаю.

Neo4j

Most famous graph database.

• 1,333 mentions within repositories on Github • 1,140,000 results in Google • 26,868 tweets • Really nice Admin interface • Awesome help tips

Py2Neo, Neomodel, neo4django, bulbflow

A lot of python libraries

; Create a node1, node2 and ; relation RELATED between two nodes CREATE (node1 {name:"node1"}), (node2 {name: "node2"}), (node1)-[:RELATED]->(node2); !

neo4j is friendly and powerful. The only thing is a bit complex querying language – Cypher

from py2neo import neo4j, node, rel !!graph_db = neo4j.GraphDatabaseService( "http://localhost:7474/db/data/") !die_hard = graph_db.create( node(name="Bruce Willis"), node(name="John McClane"), node(name="Alan Rickman"), node(name="Hans Gruber"), node(name="Nakatomi Plaza"), rel(0, "PLAYS", 1), rel(2, "PLAYS", 3), rel(1, "VISITS", 4), rel(3, "STEALS_FROM", 4), rel(1, "KILLS", 3))

py4neo nodes

from py2neo import neo4j, node !graph_db = neo4j.GraphDatabaseService( "http://localhost:7474/db/data/") alice, bob, carol = node(name="Alice"), \ node(name="Bob"), \ node(name="Carol") abc = neo4j.Path( alice, "KNOWS", bob, "KNOWS", carol) abc.create(graph_db) abc.nodes # [node(**{'name': 'Alice'}), # node(**{‘name': ‘Bob'}), # node(**{‘name': 'Carol'})]

py4neo paths

Alice KNOWS Bob KNOWS Carol

from bulbs.neo4jserver import Graph g = Graph() james = g.vertices.create(name="James") julie = g.vertices.create(name="Julie") g.edges.create(james, "knows", julie)

bulbflow framework

FlockDB OrientDB InfoGrid

HyperGraphDB

WAT?

ArangoDB

–Michael Jordan

“In any investment, you expect to have fun and make profit.”

I’m developer of python driver for ArangoDB

• NoSQL Database storage

• Graph of documents

• AQL (arango query language) to execute graph queries

• Edge data type to create edges between nodes (with properties)

• Multiple edges collections to keep different kind of edges

• Support of Gremlin graph query language

Small experiment with graphs and twitter:!

I’ve looked on my tweets and people who added it to favorites.

After that I’ve looked to that person’s tweets and did the same thing with people who favorited their tweets.

1-level depth

2-level depth

3-level depth

Code behind

from arango import create !arango = create(db="tweets_maxmaxmaxmax") arango.database.create() arango.tweets.create() arango.tweets_edges.create( type=arango.COLLECTION_EDGES) !

!from_doc = arango.tweets.documents.create({}) to_doc = arango.tweets.documents.create({}) arango.tweets_edges.edges.create(from_doc, to_doc)

query = db.tweets_edge.query.over( F.EDGES( "tweets_edges", ~V("tweets/196297127"), ~V("outbound")))

Here we creating edge from from_doc to to_doc

Getting edges for tweet 196297127

Full example

• Sample dataset with 10 users • Relations between users • Visualise within admin interface

Sample dataset

from arango import create !def dataset(a): a.database.create() a.users.create() a.knows.create(type=a.COLLECTION_EDGES) ! for u in range(10): a.users.documents.create({ "name": "user_{}".format(u), "age": u + 20, "gender": u % 2 == 0}) !!a = create(db="experiments") dataset(a)

Relations between users

def relations(a): rels = ( (0, 1), (0, 2), (2, 3), (4, 3), (3, 5), (5, 1), (0, 5), (5, 6), (6, 7), (7, 8), (9, 8)) ! get_user = lambda id: a.users.query.filter( "obj.name == 'user_{}'".format(id)).execute().first ! for f, t in rels: what = "user_{} knows user_{}".format(f, t) from_doc, to_doc = get_user(f), get_user(t) a.knows.edges.create(from_doc, to_doc, {"what": what}) print ("{}->{}: {}".format(from_doc.id, to_doc.id, what)) !a = create(db="experiments") relations(a)

Relations between users

users/2744664487->users/2744926631: user_0 knows user_1 users/2744664487->users/2745123239: user_0 knows user_2 users/2745123239->users/2745319847: user_2 knows user_3 users/2745516455->users/2745319847: user_4 knows user_3 users/2745319847->users/2745713063: user_3 knows user_5 users/2745713063->users/2744926631: user_5 knows user_1 users/2744664487->users/2745713063: user_0 knows user_5 users/2745713063->users/2745909671: user_5 knows user_6 users/2745909671->users/2746106279: user_6 knows user_7 users/2746106279->users/2746302887: user_7 knows user_8 users/2746499495->users/2746302887: user_9 knows user_8

AQL, getting pathsFOR p IN PATHS(users, knows, 'outbound') FILTER p.source.name == 'user_5' RETURN p.vertices[*].name

from arango import create from arango.aql import F, V !!def querying(a): for data in a.knows.query.over( F.PATHS("users", "knows", ~V("outbound")))\ .filter("obj.source.name == '{}'".format("user_5"))\ .result("obj.vertices[*].name")\ .execute(wrapper=lambda c, i: i): print (data) !!a = create(db="experiments") !querying(a)

Paths output

['user_5'] ['user_5', 'user_1'] ['user_5', 'user_6'] ['user_5', 'user_6', 'user_7'] ['user_5', 'user_6', 'user_7', 'user_8']

Links

• Arango paths: http://goo.gl/n2L3SK • Neo4j: http://goo.gl/au5y9I • Scraper: http://goo.gl/nvMFGk!• Visualiser: http://goo.gl/Rzdwci

Thanks. Q’s? !

@maxmaxmaxmax

top related