titan nyc meetup march 2014

71
AURELIUS THINKAURELIUS.COM TITAN Scalable Graph Database Matthias Broecheler @mbroecheler March 6 th , MMXIII

Upload: matthias-broecheler

Post on 15-Jan-2015

604 views

Category:

Technology


2 download

DESCRIPTION

Slides from the meetup presentation in NYC (March 2014). Covers the current version of Titan and Faunus.

TRANSCRIPT

Page 1: Titan NYC Meetup March 2014

AURELIUSTHINKAURELIUS.COM

TITANScalable Graph Database

Matthias Broecheler@mbroechelerMarch 6th, MMXIII

Page 2: Titan NYC Meetup March 2014

Graph Database

distributed

real time

opensource

Page 3: Titan NYC Meetup March 2014

name: Herculestype: demigod

name: Cerberustype: monster

battled

time:12

Vertex

Edge Label

Edge

Property= key + value

Page 4: Titan NYC Meetup March 2014

name: Jupitertype: god

name: Herculestype: demigod

name: Cerberustype: monster

father father

mother

brother

brotherbattled

pet

time:12

name: Plutotype: godage: 4000

name: Neptunetype: godage: 4500

name: Alcmenetype: humanage: 45

name: Saturntype: titanage: 10000

name: Hydratype: monster

battledtime: 2

Page 5: Titan NYC Meetup March 2014

name: Jupitertype: god

name: Herculestype: demigod

name: Cerberustype: monster

father father

mother

brother

brotherbattled

pet

time:12

name: Plutotype: godage: 4000

name: Neptunetype: godage: 4500

name: Alcmenetype: humanage: 45

name: Saturntype: titanage: 10000

name: Hydratype: monster

battledtime: 2

g.Vg.E

Page 6: Titan NYC Meetup March 2014

v

name: Jupitertype: god

name: Herculestype: demigod

name: Cerberustype: monster

father father

mother

brother

brotherbattled

pet

time:12

name: Plutotype: godage: 4000

name: Neptunetype: godage: 4500

name: Alcmenetype: humanage: 45

name: Saturntype: titanage: 10000

name: Hydratype: monster

battledtime: 2

v = g.V.has(‘name’,’Hercules’)

Page 7: Titan NYC Meetup March 2014

v

name: Jupitertype: god

name: Herculestype: demigod

name: Cerberustype: monster

father father

mother

brother

brotherbattled

pet

time:12

name: Plutotype: godage: 4000

name: Neptunetype: godage: 4500

name: Alcmenetype: humanage: 45

name: Saturntype: titanage: 10000

name: Hydratype: monster

battledtime: 2

v.out(‘father’,’mother’)

Page 8: Titan NYC Meetup March 2014

v

name: Jupitertype: god

name: Herculestype: demigod

name: Cerberustype: monster

father father

mother

brother

brotherbattled

pet

time:12

name: Plutotype: godage: 4000

name: Neptunetype: godage: 4500

name: Alcmenetype: humanage: 45

name: Saturntype: titanage: 10000

name: Hydratype: monster

battledtime: 2

v.out(‘father’).out(‘brother’).name

Page 9: Titan NYC Meetup March 2014

v

name: Jupitertype: god

name: Herculestype: demigod

name: Cerberustype: monster

father father

mother

brother

brotherbattled

pet

time:12

name: Plutotype: godage: 4000

name: Neptunetype: godage: 4500

name: Alcmenetype: humanage: 45

name: Saturntype: titanage: 10000

name: Hydratype: monster

battledtime: 2

v.outE(‘battled’).has(‘time’,T.gt,5).inV.name

Page 10: Titan NYC Meetup March 2014

name: Jupitertype: god

name: Herculestype: demigod

name: Cerberustype: monster

father father

mother

brother

brotherbattled

pet

time:12

name: Plutotype: godage: 4000

name: Neptunetype: godage: 4500

name: Alcmenetype: humanage: 45

name: Saturntype: titanage: 10000

name: Hydratype: monster

battledtime: 2

g.V.has(‘age’,T.gt,4200)

Page 11: Titan NYC Meetup March 2014

name: Jupitertype: god

name: Herculestype: demigod

name: Cerberustype: monster

father father

mother

brother

brotherbattled

pet

time:12

name: Plutotype: godage: 4000

name: Neptunetype: godage: 4500

name: Alcmenetype: humanage: 45

name: Saturntype: titanage: 10000

name: Hydratype: monster

battledtime: 2

g.E.has(‘time’,T.lt,5)

Page 12: Titan NYC Meetup March 2014

name: Jupitertype: god

name: Herculestype: demigod

name: Cerberustype: monster

father father

mother

brother

brotherbattled

pet

time:12

name: Plutotype: godage: 4000

name: Neptunetype: godage: 4500

name: Alcmenetype: humanage: 45

name: Saturntype: titanage: 10000

name: Hydratype: monster

battledtime: 2

saturn.as('x').in('father').loop('x'){it.loops < 3}.next()

Page 13: Titan NYC Meetup March 2014

name: Jupitertype: god

name: Herculestype: demigod

name: Cerberustype: monster

father father

mother

brother

brotherbattled

pet

time:12

name: Plutotype: godage: 4000

name: Neptunetype: godage: 4500

name: Alcmenetype: humanage: 45

name: Saturntype: titanage: 10000

name: Hydratype: monster

battledtime: 2

g.V.sideEffect{it.rank = it.both.both.both.count()

}

Page 14: Titan NYC Meetup March 2014

AURELIUSTHINKAURELIUS.COM

Titan DatabaseArchitecture Overview

Page 15: Titan NYC Meetup March 2014

Titan Features

I. Data Management

II. Vertex-Centric Indices

Page 16: Titan NYC Meetup March 2014

Titan Features

III. Graph Partitioning

IV. Edge Compression

Page 17: Titan NYC Meetup March 2014

Architecture Analogy

MyISAM

Page 18: Titan NYC Meetup March 2014

Flexible Persistence

Partitionability

AvailabilityConsistency

Page 19: Titan NYC Meetup March 2014

g.E.has(‘location’,WITHIN,Geoshape.circle(38,24,50)

Full text & Geo Search

Page 20: Titan NYC Meetup March 2014

I. Navigate Memory

Page 21: Titan NYC Meetup March 2014

Sequential Data Access

Page 22: Titan NYC Meetup March 2014

II. Manage Concurrency

Multiple users Units of work

Atomicity Isolation Consistency Distribution

Transactions

Page 23: Titan NYC Meetup March 2014

Vertex Representation

5

Property

Property

Out-Edge

In-Edge

Out-Edge

In-Edge

In-Edge

row indices for fastvertex centric queries

byte

ord

er

sort

ing

cell = column + value

row

key

Page 24: Titan NYC Meetup March 2014

Titan Storage Model

Adjacency list in onecolumn family

Row key = vertex id Each property and edge

in one column Denormalized, i.e. stored twice

Direction and label/key as column prefix Use slice predicate for quick retrieval

5

5

Page 25: Titan NYC Meetup March 2014

label id +

directionsort key

Δ vertex id

Δ edgeid

signaturepropertie

s

other propertie

s

Edge Representation

Column Value

compressed serialized objects

variable long encoding

Properties & Edges are atomic

Page 26: Titan NYC Meetup March 2014

Vertex-Centric Indices

Sort and index edges per vertex by sor tkey Sort key can be composite

Enables efficient focused traversals Only retrieve edges that

matter Uses push down

predicates for quick, index-driven retrieval

Page 27: Titan NYC Meetup March 2014

v

time: 1

foughtfoughtfather

mother

battled battled battled

battled

time: 3 time: 5

time: 9v.query()

Page 28: Titan NYC Meetup March 2014

v

time: 1

father

mother

battled battled battled

battled

time: 3 time: 5

time: 9v.query() .direction(OUT)

Page 29: Titan NYC Meetup March 2014

v

time: 1

battled battled battled

battled

time: 3 time: 5

time: 9v.query() .direction(OUT) .labels(‘battled’)

Page 30: Titan NYC Meetup March 2014

v

time: 1

battled battled

time: 3

v.query() .direction(OUT) .labels(‘battled’) .has(‘time’,T.lt,5)

Page 31: Titan NYC Meetup March 2014

v

time: 1

battled battled

time: 3

v.query() .direction(OUT) .labels(‘battled’) .has(‘time’,T.lt,5)

=

v.outE(‘battled’).has(‘time’,T.lt,5).inV

Query Optimization

Page 32: Titan NYC Meetup March 2014

Consistency

on eventually consistent storage backends, Titan can enforce consistency constraints by configuring types withUniquenessConsistency.LOCK Titan acquires locks to avoid conflicting

changes Acquiring locks is expensive use with

care Locking protocol used is configurable

reasonably safe implementation, not completely fail-safe

Page 33: Titan NYC Meetup March 2014

Token Ring

Graph Partitioning

assigns ids to map vertices into “optimal” token range

Lots of interesting questions forfuture work

uses BOP

Page 34: Titan NYC Meetup March 2014
Page 35: Titan NYC Meetup March 2014

Educating the Planet

Page 36: Titan NYC Meetup March 2014

Person

PersonStuden

tTeacher

Course

Institution

Concept

Discussion

Comment

Share

enrolledIn

teaches

relatesTo

hasCourse

belongsTo

follows

author

references

hasComment relatesTo

author

partOf

relatesTo

Page 37: Titan NYC Meetup March 2014

121 Billion Edges6.2 Billion Vertices

1 Million Universities3 . 5 Billion Students

Page 38: Titan NYC Meetup March 2014

Placement Group

hi1 .4xl

Setup

Page 39: Titan NYC Meetup March 2014

1.1 million edges / sec

using batch mode

Data Ingestion

Page 40: Titan NYC Meetup March 2014

80 m1 .medium

Page 41: Titan NYC Meetup March 2014

10,200 transactions / sec

16 randomly chosen complex traversal templates

Throughput

Page 42: Titan NYC Meetup March 2014

Titan Local Caching

Page 43: Titan NYC Meetup March 2014
Page 44: Titan NYC Meetup March 2014
Page 45: Titan NYC Meetup March 2014

Flexible Persistence

Partitionability

AvailabilityConsistency

Page 46: Titan NYC Meetup March 2014

Local Deployment

Application + TitanStorage Backend

Application + Titan + Storage Backend (embedded)

Page 47: Titan NYC Meetup March 2014

Remote Deployment

Application + Titan

Storage Backend Cluster

Page 48: Titan NYC Meetup March 2014

Server Deployment II

Application

Cluster of: (2 JVM)- Titan + Rexster- Storage Backend (via localhost)

Page 49: Titan NYC Meetup March 2014

Native BlueprintsImplementation

Gremlin QueryLanguage

Rexster Server any Titan graph can

be exposed as a REST endpoint

Titan Ecosystem

Page 50: Titan NYC Meetup March 2014

AURELIUSTHINKAURELIUS.COM

FaunusBatch Graph Analytics

Page 51: Titan NYC Meetup March 2014

Hadoop-based GraphComputing Framework

Graph Analytics Breadth-first Traversals Global Graph Computations Batch Big Graph Data

Faunus Features

Page 52: Titan NYC Meetup March 2014

Faunus Architecture

g._()

Page 53: Titan NYC Meetup March 2014

Faunus Work Flow

Compressed HDFS Graphs stored in sequence files variable length encoding prefix compression

Page 54: Titan NYC Meetup March 2014

Degree Distribution

GitHub Network

g.V.sideEffect{it.degree = it.out(‘follows’).count()

}.degree.groupCount

Page 55: Titan NYC Meetup March 2014

Degree Distribution

P(k) ~ k-γ

γ = 2.2

Page 56: Titan NYC Meetup March 2014

Global Recommendations

gremlin> g.E.has('label','pushed','to').keep.V.out('pushed').out('to').in('to').in('pushed').sideEffect('{it.score =it.pathCounter}').score.order(F.decr,'name')

# Top 5:Jippi 60892182927garbear 30095282886FakeHeal 30038040349brianchandotcom 24684133382nyarla 15230275746

Page 57: Titan NYC Meetup March 2014

AURELIUSTHINKAURELIUS.COM

Big PictureClosing Thoughts

Page 58: Titan NYC Meetup March 2014

Value in Relationshipslow high

Key-Value

Why Graph Databases?

K V

BigTable

K V V V V

Document

Relational

Graph

Page 59: Titan NYC Meetup March 2014

The value of data is proportional to the

number of meaningful relationships

Page 60: Titan NYC Meetup March 2014

Social Networks

Page 61: Titan NYC Meetup March 2014

Recommendations

Path Finding

Page 62: Titan NYC Meetup March 2014

Graph Search

Page 63: Titan NYC Meetup March 2014

Knowledge Graph

Page 64: Titan NYC Meetup March 2014

Markets & Risks

Page 65: Titan NYC Meetup March 2014

ECONOMY

Page 66: Titan NYC Meetup March 2014

Health & Medicine

Page 67: Titan NYC Meetup March 2014

HEALTH

Page 68: Titan NYC Meetup March 2014

June 14th

2012September

2012December

2012March2013

November2013

AlphaRelease

Titan0.1.0

Titan0.2.0

Titan0.3.0

Titan0.4.0

Experimental release of a distributed, open -source graph database

First stable release

Rewrite of coreIndexing & ElasticSearch

PerformanceFeature ExtensionFulgora

Faunus Release

Page 69: Titan NYC Meetup March 2014

What’s Coming

Creating and updating indexes Vertex-centric indexes Graph indexes

Log integration Tighter Titan-Faunus Integration Graph Partitioning Declarative Query Answering Usability Improvements

Page 70: Titan NYC Meetup March 2014

Aurelius Graph Cluster

OLTP OLAP

Hadoop MapReduce

Analysis resultsback into Titan

Apache 2

g.V.label.groupCountg.v(101).out

titan.thinkaurelius.com

faunus.thinkaurelius.com

[email protected]

Page 71: Titan NYC Meetup March 2014

AURELIUSTHINKAURELIUS.COM

@AURELIUSGRAPHS