graphconnect europe 2016 - tuning your cypher - petra selmer, mark needham

41
Tuning Cypher Mark Needham @markhneedham Petra Selmer@Aethelraed

Upload: neo4j-the-fastest-and-most-scalable-native-graph-database

Post on 10-Apr-2017

177 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: GraphConnect Europe 2016 - Tuning Your Cypher - Petra Selmer, Mark Needham

Tuning CypherMark Needham @markhneedham

Petra Selmer@Aethelraed

Page 2: GraphConnect Europe 2016 - Tuning Your Cypher - Petra Selmer, Mark Needham

Why do we need to tune?

‣ No query planner is ever perfect‣ You know your domain better than the

database

Page 3: GraphConnect Europe 2016 - Tuning Your Cypher - Petra Selmer, Mark Needham

The Cost planner

‣ Introduced in 2.2.0‣ It uses the statistics service in Neo4j to

assign costs to various query execution plans, picking the cheapest one

‣ All queries use this by default

Page 4: GraphConnect Europe 2016 - Tuning Your Cypher - Petra Selmer, Mark Needham

Cypher query execution

‣ http://neo4j.com/docs/snapshot/execution-plans.html‣ http://neo4j.com/blog/introducing-new-cypher-query-optimizer

Page 5: GraphConnect Europe 2016 - Tuning Your Cypher - Petra Selmer, Mark Needham

How do I view a query plan?

‣ EXPLAIN• shows the execution plan without actually

executing it or returning any results.

‣ PROFILE• executes the statement and returns the results

along with profiling information.

Page 6: GraphConnect Europe 2016 - Tuning Your Cypher - Petra Selmer, Mark Needham

Neo4j’s longest plan (so far…)

Page 7: GraphConnect Europe 2016 - Tuning Your Cypher - Petra Selmer, Mark Needham

Neo4j’s longest plan (so far…)

Page 8: GraphConnect Europe 2016 - Tuning Your Cypher - Petra Selmer, Mark Needham

Neo4j’s longest plan (so far…)

Page 9: GraphConnect Europe 2016 - Tuning Your Cypher - Petra Selmer, Mark Needham

What is our goal?

At a high level, the goal is simple: get the number of db hits down.

Page 10: GraphConnect Europe 2016 - Tuning Your Cypher - Petra Selmer, Mark Needham

an abstract unit of storage engine work.

What is a database hit?

“”

Page 11: GraphConnect Europe 2016 - Tuning Your Cypher - Petra Selmer, Mark Needham

‣ Operators to look out for• All nodes scan expensive

• Label scan cheaper

• Node index seek cheapest

• Node index scan used for range queries

‣ http://neo4j.com/docs/3.0.0-RC1/execution-plans.html

Execution plan operators

Page 12: GraphConnect Europe 2016 - Tuning Your Cypher - Petra Selmer, Mark Needham

Our data set

Page 13: GraphConnect Europe 2016 - Tuning Your Cypher - Petra Selmer, Mark Needham

Finding The Matrix

MATCH (movie {title: "The Matrix"})

RETURN movie

Page 14: GraphConnect Europe 2016 - Tuning Your Cypher - Petra Selmer, Mark Needham

Finding The Matrix

MATCH (movie

{title: "The Matrix"})

RETURN movie

Page 15: GraphConnect Europe 2016 - Tuning Your Cypher - Petra Selmer, Mark Needham

Tip: Use labels

MATCH (movie:Movie

{title: "The Matrix"})

RETURN movie

Page 16: GraphConnect Europe 2016 - Tuning Your Cypher - Petra Selmer, Mark Needham

Tip: Use labels

MATCH (movie:Movie

{title: "The Matrix"})

RETURN movie

Page 17: GraphConnect Europe 2016 - Tuning Your Cypher - Petra Selmer, Mark Needham

Finding The Matrix MATCH (movie

{title: "The Matrix"})

RETURN movie

MATCH (movie:Movie

{title: "The Matrix"})

RETURN movie

Page 18: GraphConnect Europe 2016 - Tuning Your Cypher - Petra Selmer, Mark Needham

Tip: Use indexes and constraints

‣ Indexes for non unique values‣ Constraints for unique values

CREATE INDEX ON :Movie(title)

CREATE INDEX ON :Person(name)

CREATE CONSTRAINT ON (g:Genre)

ASSERT g.name IS UNIQUE

Page 19: GraphConnect Europe 2016 - Tuning Your Cypher - Petra Selmer, Mark Needham

How does Neo4j use indexes?

‣ Indexes are only used to find the starting point for queries.

Use index scans to look up rows in tables and join them with rows from other tables

Use indexes to find the starting points for a query.

Relational

Graph

Page 20: GraphConnect Europe 2016 - Tuning Your Cypher - Petra Selmer, Mark Needham

Tip: Use indexes and constraints

MATCH (movie:Movie

{title: "The Matrix"})

RETURN movie

Page 21: GraphConnect Europe 2016 - Tuning Your Cypher - Petra Selmer, Mark Needham

Finding The Matrix (no index)MATCH (movie:Movie

{title: "The Matrix"})

RETURN movie

(index)MATCH (movie:Movie

{title: "The Matrix"})

RETURN movie

Page 22: GraphConnect Europe 2016 - Tuning Your Cypher - Petra Selmer, Mark Needham

Actors who appeared together

MATCH (a:Person {name:"Tom Hanks"})

-[:ACTS_IN]->()<-[:ACTS_IN]-

(b:Person {name:"Meg Ryan"})

RETURN COUNT(*)

Page 23: GraphConnect Europe 2016 - Tuning Your Cypher - Petra Selmer, Mark Needham

Actors who appeared together

MATCH (a:Person {name:"Tom Hanks"})

-[:ACTS_IN]->()<-[:ACTS_IN]-

(b:Person {name:"Meg Ryan"})

RETURN COUNT(*)

Page 24: GraphConnect Europe 2016 - Tuning Your Cypher - Petra Selmer, Mark Needham

Tip: Enforce index usage

MATCH (a:Person {name:"Tom Hanks"})

-[:ACTS_IN]->()<-[:ACTS_IN]-

(b:Person {name:"Meg Ryan"})

USING INDEX a:Person(name)

USING INDEX b:Person(name)

RETURN COUNT(*)

Page 25: GraphConnect Europe 2016 - Tuning Your Cypher - Petra Selmer, Mark Needham

Tip: Enforce index usage

MATCH (a:Person {name:"Tom Hanks"})

-[:ACTS_IN]->()<-[:ACTS_IN]-

(b:Person {name:"Meg Ryan"})

USING INDEX a:Person(name)

USING INDEX b:Person(name)

RETURN COUNT(*)

Page 26: GraphConnect Europe 2016 - Tuning Your Cypher - Petra Selmer, Mark Needham

Actors who appeared togetherMATCH (a:Person {name:"Tom Hanks"})

-[:ACTS_IN]->()<-[:ACTS_IN]-

(b:Person {name:"Meg Ryan"})

RETURN COUNT(*)

MATCH (a:Person {name:"Tom Hanks"})

-[:ACTS_IN]->()<-[:ACTS_IN]-

(b:Person {name:"Meg Ryan"})

USING INDEX a:Person(name)

USING INDEX b:Person(name)

RETURN COUNT(*)

Page 27: GraphConnect Europe 2016 - Tuning Your Cypher - Petra Selmer, Mark Needham

Tom Hanks’ colleagues’ movies

MATCH (p:Person {name:"Tom Hanks"})

-[:ACTS_IN]->(m1)<-[:ACTS_IN]-

(coActor)-[:ACTS_IN]->(m2)

RETURN distinct m2.title

Page 28: GraphConnect Europe 2016 - Tuning Your Cypher - Petra Selmer, Mark Needham

Tom Hanks’ colleagues’ movies

MATCH (p:Person {name:"Tom Hanks"})

-[:ACTS_IN]->(m1)<-[:ACTS_IN]-

(coActor)-[:ACTS_IN]->(m2)

RETURN distinct m2.title

Page 29: GraphConnect Europe 2016 - Tuning Your Cypher - Petra Selmer, Mark Needham

Tip: Reduce cardinality of WIP

MATCH (p:Person {name:"Tom Hanks"})

-[:ACTS_IN]->(m1)<-[:ACTS_IN]-

(coActor)

WITH DISTINCT coActor

MATCH (coActor)-[:ACTS_IN]->(m2)

RETURN distinct m2.title

Page 30: GraphConnect Europe 2016 - Tuning Your Cypher - Petra Selmer, Mark Needham

Tip: Reduce cardinality of WIP

MATCH (p:Person {name:"Tom Hanks"})

-[:ACTS_IN]->(m1)<-[:ACTS_IN]-

(coActor)

WITH DISTINCT coActor

MATCH (coActor)-[:ACTS_IN]->(m2)

RETURN distinct m2.title

Page 31: GraphConnect Europe 2016 - Tuning Your Cypher - Petra Selmer, Mark Needham

MATCH (p:Person {name:"Tom Hanks"})

-[:ACTS_IN]->(m1)<-[:ACTS_IN]-(coActor)

WITH DISTINCT coActor

MATCH (coActor)-[:ACTS_IN]->(m2)

RETURN distinct m2.title

Tom Hanks’ colleagues’ moviesMATCH (p:Person {name:"Tom Hanks"})

-[:ACTS_IN]->(m1)<-[:ACTS_IN]-

(coActor)-[:ACTS_IN]->(m2)

RETURN distinct m2.title;

Page 32: GraphConnect Europe 2016 - Tuning Your Cypher - Petra Selmer, Mark Needham

Hints

USING INDEX Force the use of a specific index

MATCH (a:Person {name:"TomHanks"})-[:ACTS_IN]->()

USING INDEX a:Person(name)

RETURN count(*)

Page 33: GraphConnect Europe 2016 - Tuning Your Cypher - Petra Selmer, Mark Needham

Hints

USING SCAN Forces a label scan on lower cardinality labels

MATCH (a:Actor)-->(m:Movie:Comedy)

USING SCAN m:Comedy

RETURN count(distinct a)

Page 34: GraphConnect Europe 2016 - Tuning Your Cypher - Petra Selmer, Mark Needham

Even more tips...

Page 35: GraphConnect Europe 2016 - Tuning Your Cypher - Petra Selmer, Mark Needham

Use parameters

MATCH (p:Person {name: {name}})

-[:ACTS_IN]->(m)

RETURN m.title

MATCH (p:Person {name:"Tom Hanks"})

-[:ACTS_IN]->(m)

RETURN m.title

Page 36: GraphConnect Europe 2016 - Tuning Your Cypher - Petra Selmer, Mark Needham

Avoid Cartesian products

‣ Easy to do this inadvertently:

MATCH (a:Actor), (m:Movie)

RETURN count(a), count(m)

‣ This is correct, and performs betterMATCH (a:Actor)

WITH count(a) as a_count

MATCH (m:Movie)

RETURN a_count, count(m)

Page 37: GraphConnect Europe 2016 - Tuning Your Cypher - Petra Selmer, Mark Needham

Watch out for those warnings!

Page 38: GraphConnect Europe 2016 - Tuning Your Cypher - Petra Selmer, Mark Needham

Cardinalities

Watch those rows!

Page 39: GraphConnect Europe 2016 - Tuning Your Cypher - Petra Selmer, Mark Needham

Only RETURN what you need

‣ This is not recommended:MATCH (a:Actor)

RETURN a

‣ Use this instead:MATCH (a:Actor)

RETURN a.name, a.birthdate, a.height

Page 40: GraphConnect Europe 2016 - Tuning Your Cypher - Petra Selmer, Mark Needham

tl;dr

‣ View query plans with EXPLAIN and PROFILE‣ Use labels‣ Index your starting points‣ Reduce work in progress‣ Remember the hints

Page 41: GraphConnect Europe 2016 - Tuning Your Cypher - Petra Selmer, Mark Needham

Thanks for coming

‣ And don’t forget, if the tips aren’t working ask us for help on Stack Overflow!

Mark Needham @markhneedham Petra Selmer @Aethelraed