cypher

Cypher Query Language

Chicago Graph Database Meet-UpMax De Marzi

What is Cypher?

•Graph Query Language for Neo4j

•Aims to make querying simple

Why Cypher?

•Existing Neo4j query mechanisms were not simple enough

•Too verbose (Java API)

•Too prescriptive (Gremlin)

SQL?

•Unable to express paths

•these are crucial for graph-based reasoning

•Neo4j is schema/table free

SPARQL?

•SPARQL designed for a different data model

•namespaces

•properties as nodes

•high learning curve

Design

Design Decisions

DeclarativeMost of the time, Neo4j knows better than you

Imperative Declarative

follow relationshipbreadth-first vs depth-

first

explicit algorithm

specify starting pointspecify desired

outcome

algorithm adaptablebased on query

Design Decisions

Pattern matching

Design Decisions

Pattern matching

AA

BB CC

Design Decisions

Pattern matching

Design Decisions

ASCII-art patterns

() --> ()

Design Decisions

Directed relationship

(A) --> (B)

AA BB

Design Decisions

Undirected relationship

(A) -- (B)

AA BB

Design Decisions

specific relationships

A -[:LOVES]-> B

AA BBLOVES

Design Decisions

Joined paths

A --> B --> C

AA BB CC

Design Decisions

multiple paths

A --> B --> C, A --> C

AA

BB CC

A --> B --> C <-- A

Design Decisions

Optional relationships

A -[?]-> B

AA BB

Design Decisions

Familiar for SQL users

selectfrom

wheregroup byorder by

startmatchwherereturn

START

SELECT * FROM Person WHERE firstName = “Max”

START max=node:persons(firstName = “Max”) RETURN max

MATCHSELECT skills.* FROM users JOIN skills ON users.id = skills.user_id WHERE users.id = 101

START user = node(101) MATCH user --> skills RETURN skills

Optional MATCHSELECT skills.* FROM users LEFT JOIN skills ON users.id = skills.user_id WHERE users.id = 101

START user = node(101) MATCH user –[?]-> skills RETURN skills

SELECT skills.*, user_skill.* FROM users JOIN user_skill ON users.id = user_skill.user_id JOIN skills ON user_skill.skill_id = skill.id WHERE users.id = 1

START user = node(1) MATCH user -[user_skill]-> skill RETURN skill, user_skill

Indexes

Used as multiple starting points, not to speed up any traversals

START a = node:nodes_index(type='User') MATCH a-[r:knows]-bRETURN ID(a), ID(b), r.weight

http://maxdemarzi.com/2012/03/16/jung-in-neo4j-part-2/

Complicated Match

Some UGLY recursive self join on the groups table

START max=node:person(name=“Max") MATCH group <-[:BELONGS_TO*]- max RETURN group

WhereSELECT person.* FROM person WHERE person.age >32 OR person.hair = "bald"

START person = node:persons("name:*") WHERE person.age >32 OR person.hair = "bald" RETURN person

ReturnSELECT person.name, count(*) FROM Person GROUP BY person.name ORDER BY person.name

START person=node:persons("name:*") RETURN person.name, count(*) ORDER BY person.name

Order By, Parameters

Same as SQL

{node_id} expected as part of request

START me = node({node_id})MATCH (me)-[?:follows]->(friends)-[?:follows]->(fof)-[?:follows]->(fofof)-[?:follows]->othersRETURN me.name, friends.name, fof.name, fofof.name, count(others)ORDER BY friends.name, fof.name, fofof.name, count(others) DESC

http://maxdemarzi.com/2012/02/13/visualizing-a-network-with-cypher/

Graph Functions

Some UGLY multiple recursive self and inner joins on the user and all related tables

START lucy=node(1000), kevin=node(759) MATCH p = shortestPath( lucy-[*]-kevin ) RETURN p

Aggregate FunctionsID: get the neo4j assigned identifierCount: add up the number of occurrencesMin: get the lowest valueMax: get the highest valueAvg: get the average of a numeric valueDistinct: remove duplicates

START me = node:nodes_index(type = 'user')MATCH (me)-[r?:wrote]-()RETURN ID(me), me.name, count(r), min(r.date), max(r.date)" ORDER BY ID(me)

Functions

Collect: put all values in a list

START a = node:nodes_index(type='User')MATCH a-[:follows]->bRETURN a.name, collect(b.name)

http://maxdemarzi.com/2012/02/02/graph-visualization-and-neo4j-part-three/

Combine Functions

Collect the ID of friends

START me = node:nodes_index(type = 'user')" MATCH (me)<-[r?:wrote]-(friends)RETURN ID(me), me.name, collect(ID(friends)), collect(r.date)ORDER BY ID(me)

http://maxdemarzi.com/2012/03/08/connections-in-time/

UsesRecommend Friends

START me = node({node_id}) MATCH (me)-[:friends]->(friend)-[:friends]->(foaf) RETURN foaf.name

UsesSix Degrees of Kevin Bacon

START me=node({start_node_id}), them=node({destination_node_id}) MATCH path = allShortestPaths( me-[?*]->them ) RETURN length(path), extract(person in nodes(path) : person.name)

Length: counts the number of nodes along a pathExtract: gets the nodes/relationships from a path

UsesSimilar Users

START me = node(user1) MATCH (me)-[myRating:RATED]->(i)<-[otherRating:RATED]-(u)WHERE abs(myRating.rating-otherRating.rating)<=2RETURN u

Users who rated same items within 2 points.

Abs: gets absolute numeric value

http://thought-bytes.blogspot.com/2012/02/similarity-based-recommendations-with.html

START me=node(user1), similarUsers=node(3) (result received in the first query)MATCH (similarUsers)-[r:RATED]->(item)WHERE r.rating > 7 AND NOT((me)-[:RATED]->(item)) RETURN item

Items with a rating > 7 that similar users rated, but I have notAnd: this and that are trueOr: this or that is trueNot: this is false

Boolean Operations



Predicates

START london = node(1), moscow = node(2)MATCH path = london -[*]-> moscowWHERE all(city in nodes(path) where city.capital = true)

ALL: closure is true for all itemsANY: closure is true for any itemNONE: closure is true for no itemsSINGLE: closure is true for exactly 1 item

Implementation•Recursive matching with

backtracking

START x=... MATCH x-->y, x-->z, y-->z, z-->a-->b, z-->b

Implementation

Execution Plan

start n=node(0)return n

Parameters()Nodes(n)Extract([n])ColumnFilter([n])

Cypher is Pipeslazily evaluated pulling from pipes underneath

Implementation

Execution Plan

start n=node(0)match n-[*]-> b return n.name, n, count(*) order by n.age

Parameters()Nodes(n)PatternMatch(n-[*]->b)Extract([n.name, n])EagerAggregation( keys: [n.name, n], aggregates: [count(*)])Extract([n.age])Sort(n.age ASC)ColumnFilter([n.name,n,count(*)])

Implementation

Execution Plan

start n=node(0) match n-[*]-> b return n.name, n, count(*) order by n.name

Parameters()Nodes(n)PatternMatch(n-[*]->b)Extract([n.name, n])Sort(n.name ASC,n ASC)EagerAgregation( keys: [n.name, n], aggregates: [count(*)])ColumnFilter([n.name,n,count(*)])

Thanks for Listening!

Questions?

maxdemarzi.com

cypher

Technology

design decisions pattern

design decisions familiar

node1match user user

node101match user

usersjoin user

loves b

user matchar

skillsreturn skills