cypher
TRANSCRIPT
Cypher Query Language
Chicago Graph Database Meet-UpMax De Marzi
What is Cypher?
•Graph Query Language for Neo4j
•Aims to make querying simple
Why Cypher?
•Existing Neo4j query mechanisms were not simple enough
•Too verbose (Java API)
•Too prescriptive (Gremlin)
SQL?
•Unable to express paths
•these are crucial for graph-based reasoning
•Neo4j is schema/table free
SPARQL?
•SPARQL designed for a different data model
•namespaces
•properties as nodes
•high learning curve
Design
Design Decisions
DeclarativeMost of the time, Neo4j knows better than you
Imperative Declarative
follow relationshipbreadth-first vs depth-
first
explicit algorithm
specify starting pointspecify desired
outcome
algorithm adaptablebased on query
Design Decisions
Pattern matching
Design Decisions
Pattern matching
AA
BB CC
Design Decisions
Pattern matching
Design Decisions
Pattern matching
Design Decisions
Pattern matching
Design Decisions
Pattern matching
Design Decisions
ASCII-art patterns
() --> ()
Design Decisions
Directed relationship
(A) --> (B)
AA BB
Design Decisions
Undirected relationship
(A) -- (B)
AA BB
Design Decisions
specific relationships
A -[:LOVES]-> B
AA BBLOVES
Design Decisions
Joined paths
A --> B --> C
AA BB CC
Design Decisions
multiple paths
A --> B --> C, A --> C
AA
BB CC
A --> B --> C <-- A
Design Decisions
Optional relationships
A -[?]-> B
AA BB
Design Decisions
Familiar for SQL users
selectfrom
wheregroup byorder by
startmatchwherereturn
START
SELECT * FROM Person WHERE firstName = “Max”
START max=node:persons(firstName = “Max”) RETURN max
MATCHSELECT skills.* FROM users JOIN skills ON users.id = skills.user_id WHERE users.id = 101
START user = node(101) MATCH user --> skills RETURN skills
Optional MATCHSELECT skills.* FROM users LEFT JOIN skills ON users.id = skills.user_id WHERE users.id = 101
START user = node(101) MATCH user –[?]-> skills RETURN skills
SELECT skills.*, user_skill.* FROM users JOIN user_skill ON users.id = user_skill.user_id JOIN skills ON user_skill.skill_id = skill.id WHERE users.id = 1
START user = node(1) MATCH user -[user_skill]-> skill RETURN skill, user_skill
Indexes
Used as multiple starting points, not to speed up any traversals
START a = node:nodes_index(type='User') MATCH a-[r:knows]-bRETURN ID(a), ID(b), r.weight
http://maxdemarzi.com/2012/03/16/jung-in-neo4j-part-2/
Complicated Match
Some UGLY recursive self join on the groups table
START max=node:person(name=“Max") MATCH group <-[:BELONGS_TO*]- max RETURN group
WhereSELECT person.* FROM person WHERE person.age >32 OR person.hair = "bald"
START person = node:persons("name:*") WHERE person.age >32 OR person.hair = "bald" RETURN person
ReturnSELECT person.name, count(*) FROM Person GROUP BY person.name ORDER BY person.name
START person=node:persons("name:*") RETURN person.name, count(*) ORDER BY person.name
Order By, Parameters
Same as SQL
{node_id} expected as part of request
START me = node({node_id})MATCH (me)-[?:follows]->(friends)-[?:follows]->(fof)-[?:follows]->(fofof)-[?:follows]->othersRETURN me.name, friends.name, fof.name, fofof.name, count(others)ORDER BY friends.name, fof.name, fofof.name, count(others) DESC
http://maxdemarzi.com/2012/02/13/visualizing-a-network-with-cypher/
Graph Functions
Some UGLY multiple recursive self and inner joins on the user and all related tables
START lucy=node(1000), kevin=node(759) MATCH p = shortestPath( lucy-[*]-kevin ) RETURN p
Aggregate FunctionsID: get the neo4j assigned identifierCount: add up the number of occurrencesMin: get the lowest valueMax: get the highest valueAvg: get the average of a numeric valueDistinct: remove duplicates
START me = node:nodes_index(type = 'user')MATCH (me)-[r?:wrote]-()RETURN ID(me), me.name, count(r), min(r.date), max(r.date)" ORDER BY ID(me)
Functions
Collect: put all values in a list
START a = node:nodes_index(type='User')MATCH a-[:follows]->bRETURN a.name, collect(b.name)
http://maxdemarzi.com/2012/02/02/graph-visualization-and-neo4j-part-three/
Combine Functions
Collect the ID of friends
START me = node:nodes_index(type = 'user')" MATCH (me)<-[r?:wrote]-(friends)RETURN ID(me), me.name, collect(ID(friends)), collect(r.date)ORDER BY ID(me)
http://maxdemarzi.com/2012/03/08/connections-in-time/
UsesRecommend Friends
START me = node({node_id}) MATCH (me)-[:friends]->(friend)-[:friends]->(foaf) RETURN foaf.name
UsesSix Degrees of Kevin Bacon
START me=node({start_node_id}), them=node({destination_node_id}) MATCH path = allShortestPaths( me-[?*]->them ) RETURN length(path), extract(person in nodes(path) : person.name)
Length: counts the number of nodes along a pathExtract: gets the nodes/relationships from a path
UsesSimilar Users
START me = node(user1) MATCH (me)-[myRating:RATED]->(i)<-[otherRating:RATED]-(u)WHERE abs(myRating.rating-otherRating.rating)<=2RETURN u
Users who rated same items within 2 points.
Abs: gets absolute numeric value
http://thought-bytes.blogspot.com/2012/02/similarity-based-recommendations-with.html
START me=node(user1), similarUsers=node(3) (result received in the first query)MATCH (similarUsers)-[r:RATED]->(item)WHERE r.rating > 7 AND NOT((me)-[:RATED]->(item)) RETURN item
Items with a rating > 7 that similar users rated, but I have notAnd: this and that are trueOr: this or that is trueNot: this is false
Boolean Operations
Predicates
START london = node(1), moscow = node(2)MATCH path = london -[*]-> moscowWHERE all(city in nodes(path) where city.capital = true)
ALL: closure is true for all itemsANY: closure is true for any itemNONE: closure is true for no itemsSINGLE: closure is true for exactly 1 item
Implementation•Recursive matching with
backtracking
START x=... MATCH x-->y, x-->z, y-->z, z-->a-->b, z-->b
Implementation
Execution Plan
start n=node(0)return n
Parameters()Nodes(n)Extract([n])ColumnFilter([n])
Cypher is Pipeslazily evaluated pulling from pipes underneath
Implementation
Execution Plan
start n=node(0)match n-[*]-> b return n.name, n, count(*) order by n.age
Parameters()Nodes(n)PatternMatch(n-[*]->b)Extract([n.name, n])EagerAggregation( keys: [n.name, n], aggregates: [count(*)])Extract([n.age])Sort(n.age ASC)ColumnFilter([n.name,n,count(*)])
Implementation
Execution Plan
start n=node(0) match n-[*]-> b return n.name, n, count(*) order by n.name
Parameters()Nodes(n)PatternMatch(n-[*]->b)Extract([n.name, n])Sort(n.name ASC,n ASC)EagerAgregation( keys: [n.name, n], aggregates: [count(*)])ColumnFilter([n.name,n,count(*)])
Thanks for Listening!
Questions?
maxdemarzi.com