Transcript
Page 1: An Introduction to Graph Databases

www.Objectivity.com

An Introduction To Graph Databases

August 20, 2013

Leon Guzenda & Nick Quinn

Page 2: An Introduction to Graph Databases

• Introductions

• Graph Theory

• Commonly Used Graph Algorithms

• Graph Databases

• Current Implementations

• Use Cases

• Hands-On Tutorial

Overview

Page 3: An Introduction to Graph Databases

We Are From Objectivity Inc.

Copyright © Objectivity, Inc. 2012

• Objectivity, Inc. is headquartered in Sunnyvale, CA.

• Established in 1988 to tackle database problems that network/hierarchical/relational and file-based technologies struggle with.

• Objectivity has over two decades of Big Data and NoSQL experience

• Develops NoSQL platforms for managing and discovering relationships and patterns in complex data:

• Objectivity/DB - an object database that manages localized, centralized or distributed databases

• InfiniteGraph - a massively scalable graph database built on Objectivity/DB that enables

organizations to find, store and exploit the relationships in their data

• The Big Data market is projected to be around $12B in 2012, with a CAGR of 28% over the next five years.

• 40% per year data growth, cloud adoption, mobile usage and improved real-time analytics underpin Objectivity’s growth opportunities as a Big Data analytics enabler.

• Embedded in hundreds of enterprises, government organizations and products - millions of deployments.

• Consistently generates increased revenues.

• Privately held by the employees and a few venture capital companies.

Products

Company

Markets

Customers

Financials

Page 4: An Introduction to Graph Databases

GRAPH THEORY

Page 5: An Introduction to Graph Databases

The History of Graph Theory

1736: Leonard Euler writes a paper on the “Seven Bridges of Konisberg”

1845: Gustav Kirchoff publishes his electrical circuit laws

1852: Francis Guthrie poses the “Four Color Problem”

1878: Sylvester publishes an article in Nature magazine that describes graphs

1936: Dénes Kőnig publishes a textbook on Graph Theory

1941: Ramsey and Turán define Extremal Graph Theory

1959: De Bruijn publishes a paper summarizing Enumerative Graph Theory

1959: Erdos, Renyi and Gilbert define Random Graph Theory

1969: Heinrich Heesch solves the “Four Color” problem

2003: Commercial Graph Database products start appearing on the market

Page 6: An Introduction to Graph Databases

Graph Theory Terminology...

VERTEX: A single node in a graph data structure

EDGE: A connection between a pair of VERTICES

PROPERTIES: Data items that belong to a particular Vertex or Edge

WEIGHT: A quantity associated with a particular Edge

GRAPH: A network of linked Vertex and Edge objects

Vertex 1 Vertex 2Edge 1

City: San FranciscoPop: 812,826

City: San JosePop: 967,487

Road: I-101Miles: 47.8

Page 7: An Introduction to Graph Databases

...Graph Theory Terminology...

SIMPLE/UNDIRECTED GRAPH: A Graph where each VERTEX may be linked to one or more Vertex objects via Edge objects and each Edge object is connected to exactly two Vertex objects. Furthermore, neither Vertex connected to an Edge is more significant than the other.

DIRECTED GRAPH: A Simple/Undirected Graph where one Vertex in a Vertex + Edge + Vertex group (an “Arc” or “Path”) can be considered the “Head” of the Path and the other can be considered the “Tail”.

MIXED GRAPH: A Graph in which some paths are Undirected and others are Directed.

Page 8: An Introduction to Graph Databases

...Graph Theory Terminology

LOOP: An Edge that is doubly-linked to the same Vertex

MULTIGRAPH: A Graph that allows multiple Edges and Loops

QUIVER: A Graph where Vertices are allowed to be connected by multiple Arcs. A Quiver may include Loops.

WEIGHTED GRAPH: A Graph where a quantity is assigned to an Edge, e.g. a Length assigned to an Edge representing a road between two Vertices representing cities.

HALF EDGE: An Edge that is only connected to a single Vertex

LOOSE EDGE: An Edge that isn't connected to any Vertices.

CONNECTIVITY: Two Vertices are Connected if it is possible to find a path between them.

Page 9: An Introduction to Graph Databases

COMMONLY USED GRAPH ALGORITHMS

Mac Evans

Page 10: An Introduction to Graph Databases

Commonly Used Graph Algorithms...CONNECTEDNESS: Check whether or not a set of nodes in a Graph are connected.

All of the nodes in the graph below are connected, e.g. A to B, A to C via B etc.

NODE DEGREE: The degree of a node in a network is a count of the number of connections it has to other nodes. The degree distribution is the probability distribution ofthese degrees in the whole network.

In the graph below, A and D have a node degree of 1. B and C have a node degree of 3.

SHORTEST PATH: The path between two nodes that visits the fewest intermediate nodes.

In the graph above, A->B->C->D is shorter than A->B->C->B->D (disallowing loops)

Page 11: An Introduction to Graph Databases

...Commonly Used Graph Algorithms...

CENTRALITY: An assessment of the importance of a node within a network.

Degree Centrality is the simplest, being a count of the number of connections that a node has.

It may be expressed as “Indegree” (# of incoming connections) and “Outdegre” (# of outgoing connections).

Page 12: An Introduction to Graph Databases

...Commonly Used Graph Algorithms...

CLOSENESS CENTRALITY: Closeness considers the shortest paths between nodes and assigns a higher value to nodes that can be used to reach most other nodes most quickly.

In the graph below, node A has the greatest centrality as all other nodes can be reached in one “hop”, whereas others require 1 hop to A or 2 hops to any other node.

A

Page 13: An Introduction to Graph Databases

Commonly Used Graph Algorithms...CONNECTEDNESS: Check whether or not a set of nodes in a Graph are connected.

All of the nodes in the graph below are connected, e.g. A to B, A to C via B etc.

NODE DEGREE: The degree of a node in a network is a count of the number of connections it has to other nodes. The degree distribution is the probability distribution ofthese degrees in the whole network.

In the graph below, A and D have a node degree of 1. B andC have a node degree of 3.

SHORTEST PATH: The path between two nodes that visits the fewest intermediate nodes.

In the graph above, A->B->C->D is shorter than A->B->C->B->D (disallowing loops)

Page 14: An Introduction to Graph Databases

...Commonly Used Graph Algorithms...

TRANSITIVE CLOSURE: The process of exploring a graph by traversing relationships until all nodes have been visited, but without revisiting nodes that are joined together in loops.

In the graph above, A->B->C->D is a transitive closure.

SHORTEST PATH: The path between two nodes that visits the fewest intermediate nodes.

In the graph below, A->B->C->D is shorter than A->B->C->B->D (disallowing loops)

AVERAGE PATH LENGTH: The average of all path lengths between all pairs of nodes in a graph.

Page 15: An Introduction to Graph Databases

...Commonly Used Graph Algorithms...

GRAPH DIAMETER (or SPAN): The greatest distance between any pair of nodes in a graph.

It is computed by finding the shortest path between each pair of nodes. The maximum of these path lengths is a measure of the diameter of the graph.

The diameters of the two graphs below are 2 and 5.

Page 16: An Introduction to Graph Databases

...Commonly Used Graph Algorithms...

BETWEENESS CENTRALITY: A centrality measure of a node within a graph.

Nodes that have a high probability of being visited on a randomly chosen short path between two randomly chosen nodes have a high “betweeness”

In the graph below, node D has the highest betweeness centrality.

Page 17: An Introduction to Graph Databases

GRAPH DATABASES

Page 18: An Introduction to Graph Databases

Recognizing Graphs In Object Models...

Object Class A

Tree Structures

1-to-Many

Page 19: An Introduction to Graph Databases

...Recognizing Graphs In Object Models...

Relationship Data

Tree Structures

Object Class AObject Class A

1-to-Many

Page 20: An Introduction to Graph Databases

Recognizing Graphs In Object Models...Tree Structures

Graph (Network) Structures

Object Class A

1-to-Many Relationship Data

Object Class A

Many-to-Many

Object Class A

Page 21: An Introduction to Graph Databases

Copyright © Objectivity, Inc. 2012

Recognizing Graphs In Object Models...Tree Structures

Graph (Network) Structures

Relationship Data

Object Class A

Object Class A

1-to-Many Relationship Data

Object Class A

Many-to-Many

Object Class A

Page 22: An Introduction to Graph Databases

Why Do We Need Graph DBMSs?...

Relational DatabaseThink about the SQL query for finding all links between the two “blue” rows... Good luck!

Table_A Table_B Table_C Table_D Table_E Table_F Table_G

Relational databases aren’t good at handling complex relationships!

Page 23: An Introduction to Graph Databases

...Graph DBMSs Are Designed To Handle Relationships

Objectivity/DB or InfiniteGraph - The solution can be found with a few lines of code

Relational DatabaseThink about the SQL query for finding all links between the two “blue” rows... Good luck!

A3 G4

Table_A Table_B Table_C Table_D Table_E Table_F Table_G

Page 24: An Introduction to Graph Databases

Graph Databases

• Data model:– Node (Vertex) and Relationship (Edge) objects– Directed– May be a hypergraph (edges with multiple endpoints)

• Examples:– InfiniteGraph, Neo4j, OrientDB, AllegroGraph, TitanDB and Dex

VERTEX EDGE2 N

Page 25: An Introduction to Graph Databases

Copyright © Objectivity, Inc. 2012

Graph DBMSs Use A Very Simple Object Model

Tree Structures

Graph (Network) Structures

EDGE

VERTEX

GRAPH MODEL

Object Class A

1-to-Many Relationship Data

Object Class A

Many-to-Many

Object Class A

Relationship Data

Object Class A

Page 26: An Introduction to Graph Databases

Basic Capabilities Of Most Graph Databases...

Rapid Graph Traversal

Start

Finish

Page 27: An Introduction to Graph Databases

...Basic Capabilities Of Most Graph Databases...

Rapid Graph Traversal Inclusive or Exclusive Selection

X

XStart Start

Page 28: An Introduction to Graph Databases

...Basic Capabilities Of Most Graph Databases

Rapid Graph Traversal Inclusive or Exclusive Selection

X

X

Find the Shortest or All Paths Between Objects

Start Start

Start Finish

Page 29: An Introduction to Graph Databases

InfiniteGraph Capabilities

Parallel Graph Traversal Inclusive or Exclusive Selection

X

X

Shortest or All Paths Between Objects

Start Start

Start Finish Start

Compute Cost To Date

Visualize

Computational & Visualization Plug-Ins

Copyright © Objectivity, Inc. 2013

Page 30: An Introduction to Graph Databases

CURRENT IMPLEMENTATIONS

Page 31: An Introduction to Graph Databases

Graph Databases Pre-2003

Page 32: An Introduction to Graph Databases

Graph Databases Post-2003

X

Titan

Page 33: An Introduction to Graph Databases

Graph Databases Compared [UNSW]

DATA STORAGE FEATURES

Page 34: An Introduction to Graph Databases

Graph Databases Compared [DZone]

Source: http://goo.gl/ni4eoE

Page 35: An Introduction to Graph Databases

• Strengths:– Extremely fast for connected data– Scales out, typically– Easy to query (navigation)– Simple data model

• Weaknesses:– May not support distribution or sharding– Requires conceptual shift... a different way of thinking

VERTEX EDGE2 N

Graph Databases – Pros and Cons

Page 36: An Introduction to Graph Databases

USE CASES

Page 37: An Introduction to Graph Databases

Example 1 - Market AnalysisThe 10 companies that control a majority of U.S. consumer goods brands

Page 38: An Introduction to Graph Databases

Example 2 - DemographicsUsed in social network analysis, marketing, medical research etc.

Page 39: An Introduction to Graph Databases

Example 3 - Seed To Consumer Tracking

?

Page 40: An Introduction to Graph Databases

Example 4 - Ad Placement Networks

Smartphone Ad placement - based on the the user’s profile and location data captured by opt-in applications.

• The location data can be stored and distilled in a key-value and column store hybrid database, such as Cassandra

• The locations are matched with geospatial data to deduce user interests.

• As Ad placement orders arrive, an application built on a graph database such as InfiniteGraph, matches groups of users with Ads:

• Maximizes relevance for the user.

• Yields maximum value for the advertiser and the placer.

Page 41: An Introduction to Graph Databases

Example 4 - Ad Placement Networks

Smartphone Ad placement - based on the the user’s profile and location data captured by opt-in applications.

• The location data can be stored and distilled in a key-value and column store hybrid database, such as Cassandra

• The locations are matched with geospatial data to deduce user interests.

• As Ad placement orders arrive, an application built on a graph database such as InfiniteGraph, matches groups of users with Ads:

• Maximizes relevance for the user.

• Yields maximum value for the advertiser and the placer.

Page 42: An Introduction to Graph Databases

Example 5 - Healthcare Informatics

Problem: Physicians need better electronic records for managing patient data on a global basis and match symptoms, causes, treatments and interdependencies to improve diagnoses and outcomes.

• Solution: Create a database capable of leveraging existing architecture using NOSQL tools such as Objectivity/DB and InfiniteGraph that can handle data capture, symptoms, diagnoses, treatments, reactions to medications, interactions and progress.

• Result: It works:• Diagnosis is faster and more accurate• The knowledge base tracks similar medical cases.• Treatment success rates have improved.

Page 43: An Introduction to Graph Databases

Example 6 - Big Data Analytics

Page 44: An Introduction to Graph Databases

Example 7 – Visual Analytics

Page 45: An Introduction to Graph Databases

Hands On With A Graph Database

• We'll be using InfiniteGraph today

• You'll need a Java Development environment on your machine

• If you haven't downloaded InfiniteGraph already, please go to:

http://goo.gl/XzJo6T [https://download.infinitegraph.com/index.aspx]

• We'll be covering a HelloGraph and a more complex sample program


Top Related