agensgraph: a multi-model graph database based on postgresql

26
AgensGraph: a Multi-Model Graph Database based-on PostgreSQL Kisung Kim ( [email protected] ) Bitnine R&D Center 2017-1-14

Upload: kisung-kim

Post on 12-Apr-2017

707 views

Category:

Data & Analytics


6 download

TRANSCRIPT

Page 1: AgensGraph: a Multi-model Graph Database based on PostgreSql

AgensGraph: a Multi-Model Graph Database based-on PostgreSQL

Kisung Kim ([email protected])Bitnine R&D Center

2017-1-14

Page 2: AgensGraph: a Multi-model Graph Database based on PostgreSql

Who am I

• Ph.D Kisung Kim -Chief Technology Officer of Bitnine Global Inc.

• Researched query optimization for graph-structured data during doctorate degree

• Developed a distributed relational database engine in TmaxSoft

• Lead the development of a new graph database, AgensGraph in Bitnine Global

Page 3: AgensGraph: a Multi-model Graph Database based on PostgreSql

What is Graph Database?

Images from http://www.slideshare.net/debanjanmahata/an-introduction-to-nosql-graph-databases-and-neo4j

Page 4: AgensGraph: a Multi-model Graph Database based on PostgreSql

What is Graph Database?

• Relationship is the first-class citizen in the graph database

• Make your data connected in the graph database

Relational Database Graph Database

Entity Row Node (Vertex)

Relationship Row Relationship (Edge)

Page 5: AgensGraph: a Multi-model Graph Database based on PostgreSql

What is the Graph Database?

• Handle data in different view

• Data model similar to entity-relationship model

• Gartner says it represents a radical change in how data is organized and processed

Page 6: AgensGraph: a Multi-model Graph Database based on PostgreSql

Cypher Query Language

• Declarative query language for the property graph model

• Inspired by SQL and SPARQL

– Designed to be human-readable query language

• Developed by Neo technology Inc. since 2011

• Current version is 3.0

• OpenCypher.org (http://opencypher.org)

– Participate in developing the query language

Page 7: AgensGraph: a Multi-model Graph Database based on PostgreSql

Cypher Query Example

Make two nodesCREATE (:person {id: 1, name: “Kisung Kim”, birthday: 1980-01-05});CREATE (:company {id: 1, name: “Bitnine Global”});

Make a relationship between the two nodesMATCH (p:person {id: 1}), (c:company {id:1})CREATE (p)-[:workFor {title: “CTO”, since: 2014}]->(c);

Kisung Kim Bitnine GlobalworkFor

Page 8: AgensGraph: a Multi-model Graph Database based on PostgreSql

Cypher Query Example

QueryingMATCH (p:person {name: “Kisung Kim”})-[:workFor]->(c:company)RETURN (p), (c)

No Table Definitions and No Joins

Query with variable length relationshipsMATCH (p:person {name: “Kisung Kim”})-[:knows*..3]->(f:person)RETURN (f)

Kisung Kim ?workFor

Kisung Kim ?knows

?knows

?knows

Page 9: AgensGraph: a Multi-model Graph Database based on PostgreSql

GraphDB to PostgreSQL Case

• From Hipolabs

http://engineering.hipolabs.com/graphdb-to-postgresql/

Page 10: AgensGraph: a Multi-model Graph Database based on PostgreSql

Graph Database and Hybrid Database

Magic Quadrant for Operational Database Management Systems, Gartner, 2016

Page 11: AgensGraph: a Multi-model Graph Database based on PostgreSql

So, What We Want to Make is

• Hybrid database engine with graph and relational model

• Cypher query processing on PostgreSQL

• Online transactional graph database

• Disk-based persistent graph storage

( ) -[:processes]->(Cypher)

Page 12: AgensGraph: a Multi-model Graph Database based on PostgreSql

Why We Choose PostgreSQL?

• Fully-featured enterprise-ready open source database

• Graph processing actually uses relational algebra– Graph is serialized as tables in disk– Every graph traversal step is in principle a join

(from LDBC documentation)

• It is important to optimize the joins speed up join processing – PostgreSQL has an excellent query optimizer

• And…. Abundant eco-system of PostgreSQL

Page 13: AgensGraph: a Multi-model Graph Database based on PostgreSql

Challenges

• How to store graph data– Efficient structure for graph pattern matching

– At the same time, efficient for transaction processing

• How to process graph queries– Processing complex graph pattern matching: variable length path,

shortest path

– Mismatches between graph data model & relational data model

– Graph query optimization

Page 14: AgensGraph: a Multi-model Graph Database based on PostgreSql

Graph Storage

• Graph data is stored in disk as decomposed into vertexes and edges

• When processing graph pattern matching, it is essential to find adjacent vertexes or edges efficiently

– Given a start vertex, find end vertexes

– Given an end vertex, find start vertexesv1

Page 15: AgensGraph: a Multi-model Graph Database based on PostgreSql

Two Graph Databases

Solution Company Latest Version Features

Neo Technology 3.1Most famous graph database, Cypher

O(1) access using fixed-size array

Datastax -Distributed graph system based on

Cassandra

Titan

Page 16: AgensGraph: a Multi-model Graph Database based on PostgreSql

Graph Storage -Neo4j

• Fixed-size array for nodes and relationships• Relationships for a node is organized as a doubly-linked list• Index-free adjacency• O(1) access for adjacent edges: follow the pointer

From Graph Databases 2nd ed. O’Reilly, 2015

Page 17: AgensGraph: a Multi-model Graph Database based on PostgreSql

Graph Storage – Titan (DSE Graph)

• Titan stores graphs in adjacency list format

• Each edge is stored twice

• Vertex and edge list are stored in backend storage like HBase Cassandra or BerkeleyDB

From http://s3.thinkaurelius.com/docs/titan/1.0.0/data-model.html

Page 18: AgensGraph: a Multi-model Graph Database based on PostgreSql

Graph Storage -AgensGraph

• Fixed-size array is hard to implement in PostgreSQL– Tuples are moved when updated

• Titan’s big row approach is also inadequate• We chose B-tree index for graph traversal

GraphVertex Edge

Vertex ID Properties Edge ID PropertiesStart Vertex ID End Vertex ID

B-treeVertex ID

B-tree(Start, End)

B-tree(End, Start)

Page 19: AgensGraph: a Multi-model Graph Database based on PostgreSql

Index Problems

• Current B-tree has several disadvantages for our workload

– Composite index is preferable but the size increases

– There exists a lot of duplicate keys (vertex ID) on start_ID or end_ID

– Property updates incur insertions into B-trees

• We are developing a new index having bucket structure (like GIN index), in-direct index and supports for index-only scan for the graph traversals

Page 20: AgensGraph: a Multi-model Graph Database based on PostgreSql

Graph Storage -AgensGraph• Vertexes and edges are grouped into labels

• Labels are organized as a label hierarchy

• We use PostgreSQL’s table hierarchy feature

Vertex ID Properties

ag_vertex

Vertex ID PropertiesPerson

Vertex ID PropertiesMessage

Vertex ID PropertiesComment

Vertex ID PropertiesPost

Page 21: AgensGraph: a Multi-model Graph Database based on PostgreSql

Current Status

• AgensGraph v0.9 (https://github.com/bitnine-oss/agens-graph or http://bitnine.net/downloads/)

– Graph data model and DDL on PostgreSQL 9.6– Cypher query processing (70% of OpenCypher spec.)– Integrated query processing (Cypher + SQL)– Client library (JDBC, ODBC, Python)–Monitoring and development using Tadpole DB-hub

Page 22: AgensGraph: a Multi-model Graph Database based on PostgreSql

Tadpole for Agens Graph

• Tadpole DB Hub is open-source project for managing unified infrastructure (https://github.com/hangum/TadpoleForDBTools)

• Support various databases including (PostgreSQL and Agens Graph)

• Features of Tadpole for Agens Graph

– Monitoring Agens Graph server

– Cypher query browser and graph visualization

Page 23: AgensGraph: a Multi-model Graph Database based on PostgreSql

Tadpole for AgensGraph

Page 24: AgensGraph: a Multi-model Graph Database based on PostgreSql

Future Roadmap

• Distributed graph database

– Plan to exploit Postgres-XL

• Specialized storage and index for graph traversals

• Dictionary compression for JSONB (ZSON)

• Graph query optimization using graph statistics

• Integration with big data systems

– HDFS Storage

– Graph analysis using GraphX

Page 25: AgensGraph: a Multi-model Graph Database based on PostgreSql

Join Us

• AgensGraph is an open-source project https://github.com/bitnine-oss/agens-graph

• We also wish to contribute PostgreSQL community

• Graph database meetup in Silicon Valley– http://www.meetup.com/Graph-Database-in-Silicon-Valley/

Page 26: AgensGraph: a Multi-model Graph Database based on PostgreSql

Thank [email protected]

:likes