natc 2013 - using graph databases for insights into connected data

52
Xebia India 1 Using Graph Databases For Insights Into Connected Data Gagan Agrawal

Upload: nasscom

Post on 23-Jun-2015

433 views

Category:

Technology


0 download

DESCRIPTION

NASSCOM Annual Technology Conference 2013 Session: Using Graph Databases for Insights into Connected Data Speaker: Gagan Agarwal, Xebia

TRANSCRIPT

Page 1: NATC 2013 - Using Graph Databases for Insights into Connected Data

Xebia India 1

Using Graph Databases For Insights Into Connected Data

Gagan Agrawal

Page 2: NATC 2013 - Using Graph Databases for Insights into Connected Data

Xebia India 2

Agenda

High level view of Graph Space Comparison with RDBMS and other NoSQL

stores Data Modeling Cypher : Graph Query Language Graph Database Internals Graphs In Real World

Page 3: NATC 2013 - Using Graph Databases for Insights into Connected Data

Xebia India 3

What is a Graph?

Page 4: NATC 2013 - Using Graph Databases for Insights into Connected Data

Xebia India 4

Graph

Page 5: NATC 2013 - Using Graph Databases for Insights into Connected Data

Xebia India 5

What is a Graph? A collection of vertices and edges. Set of nodes and the relationships that connect

them. Graph Represents -

Entities as NODES The way those entities relate to the world as

RELATIONSHIP Allows to model all kind of scenarios

System of road Medical history Supply chain management Data Center

Page 6: NATC 2013 - Using Graph Databases for Insights into Connected Data

Xebia India 6

Example – Twitter's Data

Page 7: NATC 2013 - Using Graph Databases for Insights into Connected Data

Xebia India 7

Example – Twitter's Data

Page 8: NATC 2013 - Using Graph Databases for Insights into Connected Data

Xebia India 8

High Level view of Graph Space Graph Databases - Technologies used primarily

for transactional online graph persistence – OLTP.

Graph Compute Engines - Tecnologies used primarily for offline graph analytics - OLAP.

Page 9: NATC 2013 - Using Graph Databases for Insights into Connected Data

Xebia India 9

Graph Databases Online database management system with -

Create, Read, Update, Delete

methods that expose a graph data model. Built for use with transactional (OLTP) systems. Used for richly connected data. Querying is performed through traversals. Can perform millions of traversal steps per

second. Traversal step resembles a join in a RDBMS

Page 10: NATC 2013 - Using Graph Databases for Insights into Connected Data

Xebia India 10

Graph Database Properties

The Underlying Storage : Native / Non-Native

The Processing Engine : Native / Non-Native

Page 11: NATC 2013 - Using Graph Databases for Insights into Connected Data

Xebia India 11

Graph DB – The Underlying Storage Native Graph Storage – Optimized and designed

for storing and managing graphs.

Non-Native Graph Storage – Serialize the graph data into a relational database, an object oriented database, or some other general purpose data store.

Page 12: NATC 2013 - Using Graph Databases for Insights into Connected Data

Xebia India 12

Native Graph Storage

Page 13: NATC 2013 - Using Graph Databases for Insights into Connected Data

Xebia India 13

Graph DB – The processing Engine

Index free adjacency – Connected Nodes physically point to each other in the database

Page 14: NATC 2013 - Using Graph Databases for Insights into Connected Data

Xebia India 14

Non-Native : Index Look-Up

Page 15: NATC 2013 - Using Graph Databases for Insights into Connected Data

Xebia India 15

Native : Index Free Adjacency

Page 16: NATC 2013 - Using Graph Databases for Insights into Connected Data

Xebia India 16

Graph Databases

Page 17: NATC 2013 - Using Graph Databases for Insights into Connected Data

Xebia India 17

Power of Graph Databases

Performance

Flexibility

Agility

Page 18: NATC 2013 - Using Graph Databases for Insights into Connected Data

Xebia India 18

Comparison Relational Databases

NoSQL Databases

Graph Databases

Page 19: NATC 2013 - Using Graph Databases for Insights into Connected Data

Xebia India 19

Relational Databases Lack Relationships Initially designed to codify paper forms and

tabular structures. Deal poorly with relationships. The rise in connectedness translates into

increased joins. Lower performance. Difficult to cater for changing business needs.

Page 20: NATC 2013 - Using Graph Databases for Insights into Connected Data

Xebia India 20

RDBMS

Page 21: NATC 2013 - Using Graph Databases for Insights into Connected Data

Xebia India 21

Query to find friends-of-friends

Page 22: NATC 2013 - Using Graph Databases for Insights into Connected Data

Xebia India 22

NoSQL Databases also lack Relationships NOSQL Databases e.g key-value, document or

column oriented store sets of disconnected values/documents/columns.

Makes it difficult to use them for connected data and graphs.

One of the solution is to embed an aggregate's identifier inside the field belonging to another aggregate.

Effectively introducing foreign keys Requires joining aggregates at the application

level.

Page 23: NATC 2013 - Using Graph Databases for Insights into Connected Data

Xebia India 23

NoSQL DB Relationships between aggregates aren't first

class citizens in the data model. Foreign aggregate "links" are not reflexive. Need to use some external compute infrastructure

e.g Hadoop for such processing. Do not maintain consistency of connected data. Do not support index-free adjacency.

Page 24: NATC 2013 - Using Graph Databases for Insights into Connected Data

Xebia India 24

NoSQL DB

Page 25: NATC 2013 - Using Graph Databases for Insights into Connected Data

Xebia India 25

Graph DB Embraces Relationships

Page 26: NATC 2013 - Using Graph Databases for Insights into Connected Data

Xebia India 26

Graph DB Find friends-of-friends in a social network, to a

maximum depth of 5. Total records : 1,000,000 Each with approximately 50 friends

Page 27: NATC 2013 - Using Graph Databases for Insights into Connected Data

Xebia India 27

NoSQL Comparison

Page 28: NATC 2013 - Using Graph Databases for Insights into Connected Data

Xebia India 28

Data Modeling with Graph

Page 29: NATC 2013 - Using Graph Databases for Insights into Connected Data

Xebia India 29

Data Modeling “Whiteboard” friendly

The typical whiteboard view of a problem is a GRAPH.

Sketch in our creative and analytical modes, maps closely to the data model inside the database.

Page 30: NATC 2013 - Using Graph Databases for Insights into Connected Data

Xebia India 30

The Property Graph Model

Page 31: NATC 2013 - Using Graph Databases for Insights into Connected Data

Xebia India 31

Cypher : Graph Query Language Pattern-Matching Query Language Humane language Expressive Declarative : Say what you want, now how Borrows from well know query languages Aggregation, Ordering, Limit Update the Graph

Page 32: NATC 2013 - Using Graph Databases for Insights into Connected Data

Xebia India 32

Cypher Cypher Representation : (c)-[:KNOWS]->(b)-[:KNOWS]->(a), (c)-[:KNOWS]-

>(a)

(c)-[:KNOWS]->(b)-[:KNOWS]->(a)<-[:KNOWS]-(c)

Page 33: NATC 2013 - Using Graph Databases for Insights into Connected Data

Xebia India 33

Cypher

START c=node:user(name='Michael')MATCH (c)-[:KNOWS]->(b)-[:KNOWS]->(a), (c)-

[:KNOWS]->(a)RETURN a, b

Page 34: NATC 2013 - Using Graph Databases for Insights into Connected Data

Xebia India 34

Other Cypher Clauses WHERE

Provides criteria for filtering pattern matching results.

CREATE and CREATE UNIQUE Create nodes and relationships

DELETE Removes nodes, relationships and properties

SET Sets property values

Page 35: NATC 2013 - Using Graph Databases for Insights into Connected Data

Xebia India 35

Other Cypher Clauses FOREACH

Performs an updating action for graph element in a list.

UNION Merge results from two or more queries.

WITH Chains subsequent query parts and forward

results from one to the next. Similar to piping commands in UNIX.

Page 36: NATC 2013 - Using Graph Databases for Insights into Connected Data

Xebia India 36

Comparison of Relational and Graph Modeling

Page 37: NATC 2013 - Using Graph Databases for Insights into Connected Data

Xebia India 37

Systems Management Domain

Page 38: NATC 2013 - Using Graph Databases for Insights into Connected Data

Xebia India 38

Tables and Relationships

Page 39: NATC 2013 - Using Graph Databases for Insights into Connected Data

Xebia India 39

Graph Representation

Page 40: NATC 2013 - Using Graph Databases for Insights into Connected Data

Xebia India 40

Query to find faulty Equipment

Page 41: NATC 2013 - Using Graph Databases for Insights into Connected Data

Xebia India 41

Matched Paths

Page 42: NATC 2013 - Using Graph Databases for Insights into Connected Data

Xebia India 42

Graph Database Internals

Page 43: NATC 2013 - Using Graph Databases for Insights into Connected Data

Xebia India 43

Non Functional Characteristics

Transactions Fully ACID

Recoverability Availability Scalability

Page 44: NATC 2013 - Using Graph Databases for Insights into Connected Data

Xebia India 44

Scalability Capacity (Graph Size)

Latency (Response Time)

Read and Write Throughput

Page 45: NATC 2013 - Using Graph Databases for Insights into Connected Data

Xebia India 45

Capacity 1.9 Release of Neo4j can support single graphs

having 10s of billions of nodes, relationships and properties.

The Neo4j team has publicly expressed the intention to support 100B+ nodes/relationships/properties in a single graph.

Page 46: NATC 2013 - Using Graph Databases for Insights into Connected Data

Xebia India 46

Latency RDBMS – more data in tables/indexes result in

longer join operations. Graph DB doesn't suffer the same latency

problem. Index is used to find starting node. Traversal uses a combination of pointer chasing

and pattern matching to search the data. Performance does not depend on total size of the

dataset. Depends only on the data being queried.

Page 47: NATC 2013 - Using Graph Databases for Insights into Connected Data

Xebia India 47

Throughput Constant performance irrespective of graph size.

Page 48: NATC 2013 - Using Graph Databases for Insights into Connected Data

Xebia India 48

Graphs in the Real World

Page 49: NATC 2013 - Using Graph Databases for Insights into Connected Data

Xebia India 49

Common Use Cases Social Recommendations Geo Logistics Networks : for package routing, finding shortest

Path Financial Transaction Graphs : for fraud detection

Master Data Management Bioinformatics : Era7 to relate complex web of information

that includes genes, proteins and enzymes Authorization and Access Control : Adobe Creative

Cloud, Telenor

Page 50: NATC 2013 - Using Graph Databases for Insights into Connected Data

Xebia India 50

Who uses Neo4j ?

Page 51: NATC 2013 - Using Graph Databases for Insights into Connected Data

Xebia India 51

Resources

Page 52: NATC 2013 - Using Graph Databases for Insights into Connected Data

Xebia India 52

Thank You