agensgraph: a multi-model graph database · pdf filebig vendors oracle – 12c spatial and...
TRANSCRIPT
What is Graph Database?
•Change in data representation• Gartner says “it represents a radical change in how data is organized and processed”
•Relationship is the first-class citizen in the graph database• In relational database, it is handled implicitly
• In graph database, you can make your data more connected
Relational Database Graph Database
Entity Row Node (Vertex)
Relationship Row Relationship (Edge)
Benefits of Graph Database
• Intuitive data modeling• ER diagram-like data model
• Concise query• Doesn’t need to specify joins and its conditions• Ex) Cypher (by Neo Technology), SPARQL (by WWW)
• Performance for graph pattern matching• Optimized for processing graph traversals
• Graph analysis• Provide built-in graph analysis functions• Ex) PageRank, ShortestPath, graph clustering
Intuitive Data Modeling
Cypher Query Language
• Cypher is a SQL for graph databases• Declarative query language for the property graph model
• Developed by Neo technology Inc. since 2011
• Inspired by SQL and SPARQL (the standard query language for RDF)• Designed to be human-readable query language
•OpenCypher.org (http://opencypher.org) • Participate in developing the query language
Cypher Example
• Using the graph pattern matching and ASCII code diagrams
with recursive as (
selectparent, child as descendant,1 as level from source union all
selectd.parent, s.child, d.level + 1
from descendants as d
join source s on d.descendant = s.parent
)select * from descendantsorder by parent, level, descendant ;
Query: Find all ancestor-descendant pairs in the graph.
MATCHp=(descendant)-[:Parent*]->(ancestor)
RETURN(ancestor), (descendant), length(p)
ORDER BY (ancestor), (descendant), length(p)
Cypher
descendant ancestor
SQL
Graph Databases
• There exists many graph databases
Graph DB vendorsNeo4j – Single node, OLTP, CypherDatastax Enterprise Graph – Cassandra, Gremlin, OLTP & OLAPOrientDB – Cluster, SQL like language, document storage
Big vendorsOracle – 12c spatial and networkSAP – HANA graph, support Cypher, columnar storageIBM – provide cloud service based-on Titan, System GMicrosoft – Graph engineTeradata Aster Database – provides graph analytics
RDF DBVirtuosoo, AllegroGraph, GraphDB (ontotext)
Graph analysisGiraph – Apache projectGraphX – Spark moduleGraphLab – acquired by Apple and changed to turi.com
NoSQLMongoDB – provide simple graph lookup from 3.4 (2016 Dec) ElasticSearch – provide graph visualization and modeling
Etc.Objectivity’s ThinsSpanArangoDBJanusGraph – Forked from Titan, supported by Linux FoundataionGrakn.AI – Using Titan and Spark
What We Want to Implement• Property graph model
• OpenCypher query language
• ACID transaction
• OLTP workload and graph analytics framework
• We chose to implement it based on PostgreSQL because it already has• Robust storage engine• Transaction layer using MVCC• Cost-based query optimizer
AgensGraph
• Newest release: v1.1 (based on PostgreSQL v9.6.2)• Homepage: http://www.agensgraph.com • Download: http://bitnine.net/downloads/• Github: https://github.com/bitnine-oss/agensgraph
• A forked project of PostgreSQL (Apache license)
• Features• Multi-model: property graph data model, relational data model and JSON documents• Cypher query language support• Integrated querying using SQL and Cypher• Multiple graphs and Hierarchical graph label organization• Property indexes on both vertexes and edges• Constraints: unique, mandatory and check constraints
AgensGraph Data Model
• Extended property graph model with JSON document
• Support multiple graphs in a database
• Label hierarchy• Vertexes and edges can be grouped into labels (e.g. person, student, teacher, …)• Labels are organized as a hierarchy
Property Indexes usingBtree, GIN, BRIN, …for both vertexes and edge
Vertex Vertex
Edge
Cypher Clauses• For reading
• MATCH: find graph patterns• OPTIONAL MATCH: allows incomplete matchings
• For updating• CREATE: create a vertex or an edge• MERGE: like UPSERT• SET: modify property values
• For filtering• WHERE
• For handling results• WITH, RETURN
• And ORDER BY, LIMIT, SKIPhttps://s3.amazonaws.com/artifacts.opencypher.org/M05/railroad/Cypher.html
Example• Create graph objects
• If you want label hierarchy CREATE VLABEL student INHERITS (person);
Example• Create vertexes
Example• Create property indexes
• Create relationships
AgensGraph Architecture
• Developed in the core of PostgreSQL engine• Not a layered architecture (e.g. Titan)• Forked project of PostgreSQL• PostgreSQL is very reliable and robust
• Add graph objects
• Extend query engine for supporting Cypher query and fast graph traversal
• Maintain transaction and storage layer
JDBC/ODBC/Python/Node.js Driver
Integrated Query Processing EngineGraph query optimizerGraph query executor
Transaction LayerSupport MVCC and ACID TX
Cache LayerSupport caching graph data in memory
Graph StorageSupport label hierarchy
Optimized for fast traversal and updates
SQL & Cypher
Graph Storage• Use PostgreSQL’s heap table and B-tree indexes
• Use composite indexes for edge tables to exploit index-only scans for traversals
• We found that heap table and B-tree fast enough to process graph workload
• But we plan to design a new storage for large-scale graph processing
Cypher Query Processor• Cypher query is processed by the same process with SQL
• We integrate Cypher query processing with SQL query engine from the parser to the executor
• So you can use any PostgreSQL’s expressions and functions in Cypher
• Cypher query’s results is a relation• We treat Cypher query as a subquery• Existing query optimizations can be applied to Cypher query too
(e.g. rolling up subquery, predicate push-down, join ordering, …)
• Can make a query by combining SQL and Cypher as a subquery
Cypher Query
Parser
Analyze
Plan & Optimize
Execute
AST
Query Tree
Plan Tree
Cypher Implementation Issues• Cypher query is a chain of Cypher clauses
• Each clause produces its results as a relation
• Chained execution• The results from the former clause are provided to the next clause
• Transform a Cypher query to a query tree• Each clause is transformed to a query structure• A MATCH clause is transformed to a query structure with joins• The chained clauses are combined as subqueries
Cypher Query Processor
Actor table{name = ‘Tom Cruise’}
ACT_IN table Movie table
Query (the first MATCH)
Query (the second MATCH)
ACT_IN tableActor
{name: ‘Nicole Kidman’}
Actor table{name = ‘Tom Cruise’}
ACT_IN table Movie table
Query
ACT_IN tableActor
{name: ‘Nicole Kidman’}
Subquery rollup
Variable-length Edge (VLE) Query
• Can be implemented using recursive common table expression in SQL
• But we found that CTE is inefficient for VLE query• Using CTE is BFS (Breadth First Search)-style processing• BFS processing needs to buffer intermediate results
• We implement a new execution node for VLE query• DFS-style processing
• It is a way faster than a recursive CTE query
MATCHp=(descendant)-[:Parent*]->(ancestor)
RETURN(ancestor), (descendant), length(p)
ORDER BY (ancestor), (descendant), length(p)
Cypher
descendant ancestor
Example Cypher Plan• match (a)-[*1..5]->(b) return a, b;
Considerations for Graph Query Performance
• Graph pattern matching is usually more efficient using random page reads• set random_page_cost = 0.005• It is more efficient to cache the data in memory or use SSD for fast graph traversal
• Index-only scan is important for graph traversals• It is possible when there are no accessing for edges’ properties
• Query optimization is crucial but it is harder than SQL queries• Graph queries involves many joins• Size estimations are getting inaccurate as increasing the number of joins• PostgreSQL’s optimizer works well usually but needs to improved and more research
LDBC Benchmark
• Linked Data Benchmark (http://ldbcouncil.org)
• Participants (http://ldbcouncil.org/industry/members)• Oracle labs, IBM, Huawei, SAP, Sparsity, Openlink SW, Ontotext, Neo technology
• Benchmark tool for graph workloads• Social network benchmark (SNB)
• Simulating social network service workloads
• Graph analytics• Semantic publishing benchmark
• For RDF and SPARQL
• We conducted SNB interactive workloads
Performance Comparisons
• Caveat• We had optimized two databases as much as we can• The benchmark results can be changed by configuration settings
• Comparisons• Neo4j 3.1 community edition• AgensGraph 1.0
Future Roadmap
• Distributed and parallel processing• Extend AgensGraph using Postgres-XL
• Graph analysis framework like the vertex-centric programming model
• Support more graph analysis algorithms
• Integration with Big data systems for large-scale graph processing
Thank You!
http://agensgraph.com
Github: https://github.com/bitnine-oss/agensgraph
Bitnine Global
• Headquartered at Seoul in Korea and founded in 2014
• R&D center at Santa Clara in USA
• Provide technical services for PostgreSQL and big data
• Partner with IBM and Cloudera