c* summit 2013: distributed graph computing with titan and faunus by matthias broecheler

31
AURELIUS THINKAURELIUS.COM TITAN Distributed Graph Computing Matthias Broecheler, CTO @mbroecheler June XI, MMXIII #CASSANDRA13

Upload: planet-cassandra

Post on 15-Jan-2015

2.749 views

Category:

Technology


4 download

DESCRIPTION

This presentation introduces Titan, Faunus, and scalable graph computing in general. We present a case study of how Pearson builds an education social network on top of Titan, Faunus, and Cassandra to support learning in the 21st century. Titan is an open source distributed graph database build on top of Cassandra that can power real-time applications with thousands of concurrent users over graphs with billions of edges. Faunus is an open source global graph processing engine build on top of Hadoop and compatible with Cassandra that can analyze graphs, compute graph statistics, and execute global traversals. Titan and Faunus are components of the Aurelius Graph Cluster which enables scalable graph computation and powers applications in social networking, recommendation engines, advertisement optimization, knowledge representation, health care, education, and security.

TRANSCRIPT

Page 1: C* Summit 2013: Distributed Graph Computing with Titan and Faunus by Matthias Broecheler

AURELIUS THINKAURELIUS.COM

TITAN Distributed Graph Computing

Matthias Broecheler, CTO @mbroecheler June XI, MMXIII

#CASSANDRA13

Page 2: C* Summit 2013: Distributed Graph Computing with Titan and Faunus by Matthias Broecheler

This presentation introduces Titan, Faunus, and scalable graph computing in general. We present a case study of how Pearson builds an education social network on top of Titan, Faunus, and Cassandra to support learning in the 21st century.

Titan is an open source distributed graph database build on top of Cassandra that can power real-time applications with thousands of concurrent users over graphs with billions of edges. Faunus is an open source global graph processing engine build on top of Hadoop and compatible with Cassandra that can analyze graphs, compute graph statistics, and execute global traversals. Titan and Faunus are components of the Aurelius Graph Cluster which enables scalable graph computation and powers applications in social networking, recommendation engines, advertisement optimization, knowledge representation, health care, education, and security.

Page 3: C* Summit 2013: Distributed Graph Computing with Titan and Faunus by Matthias Broecheler

Thank You!

JOFF L?KO?MNM @?;NOL? MOAA?MNCIHM

<OA L?JILNM =IGGOHCNS MOJJILN

Page 4: C* Summit 2013: Distributed Graph Computing with Titan and Faunus by Matthias Broecheler

June 14th 2012

September 2012

December 2012

March 2013

May 2013

Alpha Release

Titan 0.1.0

Titan 0.2.0

Titan 0.3.0

Titan 0.3.1

%RJ?LCG?HN;F L?F?;M? I@ ; >CMNLC<ON?>m IJ?H rMIOL=? AL;JB >;N;<;M?

&CLMN MN;<F? L?F?;M?

2?QLCN? I@ =IL? )H>?RCHA h %F;MNC=3?;L=B

0?L@ILG;H=? "OA@CRCHA

Page 5: C* Summit 2013: Distributed Graph Computing with Titan and Faunus by Matthias Broecheler

June 14th 2012

September 2012

December 2012

March 2013

May 2013

Alpha Release

Titan 0.1.0

Titan 0.2.0

Titan 0.3.0

Titan 0.3.1

%RJ?LCG?HN;F L?F?;M? I@ ; >CMNLC<ON?>m IJ?H rMIOL=? AL;JB >;N;<;M?

&CLMN MN;<F? L?F?;M?

2?QLCN? I@ =IL? )H>?RCHA h %F;MNC=3?;L=B

0?L@ILG;H=? "OA@CRCHA

Faunus Release

Page 6: C* Summit 2013: Distributed Graph Computing with Titan and Faunus by Matthias Broecheler

Titan

Graph Database >CMNLC<ON?>

L?;F NCG?

IJ?H MIOL=?

Page 7: C* Summit 2013: Distributed Graph Computing with Titan and Faunus by Matthias Broecheler

name: Hercules type: demigod

name: Cerberus type: monster

battled

time:12

6?LN?R

%>A? ,;<?F

%>A?

0LIJ?LNS

Page 8: C* Summit 2013: Distributed Graph Computing with Titan and Faunus by Matthias Broecheler

Value in Relationships low high

Key-Value

7B?H MBIOF> SIO OM? ; 'L;JB $;N;<;M?g

K V

BigTable K V V V V

Document

Relational

Graph

"

Page 9: C* Summit 2013: Distributed Graph Computing with Titan and Faunus by Matthias Broecheler

Educating the Planet

Page 10: C* Summit 2013: Distributed Graph Computing with Titan and Faunus by Matthias Broecheler

Educating the Planet

Page 11: C* Summit 2013: Distributed Graph Computing with Titan and Faunus by Matthias Broecheler

Person

Person Student Teacher

Course

Institution

Concept

Discussion

Comment

Share

enrolledIn

teaches

relatesTo

hasCourse

belongsTo

follows

author references

hasComment relatesTo

author

partOf

relatesTo

Page 12: C* Summit 2013: Distributed Graph Computing with Titan and Faunus by Matthias Broecheler

Person

Person Student Teacher

Course

Institution

Concept

Discussion

Comment

Share

enrolledIn

teaches

relatesTo

hasCourse

belongsTo

follows

author references

hasComment relatesTo

author

partOf

relatesTo

Page 13: C* Summit 2013: Distributed Graph Computing with Titan and Faunus by Matthias Broecheler

Titan

Integrative Data Model CH ; JIFSAFIN MNIL;A? QILF>

Page 14: C* Summit 2013: Distributed Graph Computing with Titan and Faunus by Matthias Broecheler

Student

Person

Teacher

Course

Institution

Concept

Discussion

Comment

Share

enrolledIn

teaches

relatesTo

hasCourse

belongsTo

follows

author references

hasComment relatesTo

author

partOf

DiscussionRank

relatesTo

Page 15: C* Summit 2013: Distributed Graph Computing with Titan and Faunus by Matthias Broecheler

Titan

Analyze Relationships CH L?;F NCG?

Page 16: C* Summit 2013: Distributed Graph Computing with Titan and Faunus by Matthias Broecheler

Scaling Titan

HOG<?L I@ NL;HM;=NCIHM

MCT? I@ NB? AL;JB

Page 17: C* Summit 2013: Distributed Graph Computing with Titan and Faunus by Matthias Broecheler

121 Billion Edges 6.2 Billion Vertices

U -CFFCIH 5HCP?LMCNC?M

Page 18: C* Summit 2013: Distributed Graph Computing with Titan and Faunus by Matthias Broecheler

0F;=?G?HN 'LIOJ

BCU .4RF

Page 19: C* Summit 2013: Distributed Graph Computing with Titan and Faunus by Matthias Broecheler

1.1 million edges / sec

OMCHA <;N=B GI>?

Data Ingestion

Page 20: C* Summit 2013: Distributed Graph Computing with Titan and Faunus by Matthias Broecheler

\^ GU .G?>COG

Page 21: C* Summit 2013: Distributed Graph Computing with Titan and Faunus by Matthias Broecheler

x = [] as Set;!m = user.out('follows').aggregate(x)[0..(num*2-1)]!!.out('follows').except(x)[0..limit]!!.groupCount.cap.next();!

m.sort{-it.value}[0..(num-1)]!._().transform{ [userid: it.key.id, !! ! ! ! ! ! !points: it.value]};!

&IFFIQ 2?=IGG?H>;NCIH

Page 22: C* Summit 2013: Distributed Graph Computing with Titan and Faunus by Matthias Broecheler

Generic Graph API

Dataflow Processing

TraversalLanguage

Object-GraphMapper

GraphAlgorithms

GraphServer

=IIF MNO@@ =IGCHA

2%34 h *3/. 4CN;H’M %=IMSMN?G

KO?LS F;HAO;A?

Page 23: C* Summit 2013: Distributed Graph Computing with Titan and Faunus by Matthias Broecheler

10,200 transactions / sec

UZ L;H>IGFS =BIM?H =IGJF?R NL;P?LM;F N?GJF;N?M

Throughput

Page 24: C* Summit 2013: Distributed Graph Computing with Titan and Faunus by Matthias Broecheler

Transaction Description Avg (ms) Stdev (ms) Student retrieves all content for a single course in their course list

279.32 81.83

Student follows another student 193.72 22.77 Student is recommended people to follow

241.33 256.48

Student reads their stream and shares an item with followers

284.07 68.20

Student retrieves their profile 53.740 22.61 Student reads the most recent comments for their courses

211.07 45.56

Page 25: C* Summit 2013: Distributed Graph Computing with Titan and Faunus by Matthias Broecheler

Scaling Titan

N?=BHC=;F J?LMJ?=NCP?

Page 26: C* Summit 2013: Distributed Graph Computing with Titan and Faunus by Matthias Broecheler

Vertex Representation

time: 1

5

8

4

9

2

7

mother

battled

battled

battled

fought

time: 4

time: 7

CH>O=?> IL>?L

name: Hercules type: demigod

5

Property

Property

Edge

Edge

Edge

Edge

Edge

LIQ CH>C=?M @IL @;MN P?LN?R =?HNLC= KO?LC?M

Page 27: C* Summit 2013: Distributed Graph Computing with Titan and Faunus by Matthias Broecheler

label id + direction

primary key edge id Δ

vertex id signature

properties other

properties

Edge Representation

Column Value

=IGJL?MM?> M?LC;FCT?> I<D?=NM

P;LC;<F? FIHA ?H=I>CHA

Page 28: C* Summit 2013: Distributed Graph Computing with Titan and Faunus by Matthias Broecheler

Token Ring

Graph Partitioning

;MMCAHM C>M NI G;J P?LNC=?M CHNI “IJNCG;F” NIE?H L;HA?

,INM I@ CHN?L?MNCHA KO?MNCIHM @IL@ONOL? QILE

OM?M "/0

Page 29: C* Summit 2013: Distributed Graph Computing with Titan and Faunus by Matthias Broecheler

Aurelius Graph Cluster

Stores a massive-scale property graph allowing real-time traversals and updates

Batch processing of large graphs with Hadoop

Runs global graph algorithms on large, compressed,

in-memory graphs

Map/Reduce Load & Compress

Analysis results back into Titan

Bulk Load

TITAN FAUNUS FULGORA

Apache 2

[email protected]

titan.thinkaurelius.com faunus.thinkaurelius.com

Page 30: C* Summit 2013: Distributed Graph Computing with Titan and Faunus by Matthias Broecheler

Special Thanks

Steve Hill (@kindageeky) Director Architecture & Innovation

at Pearson Education

Page 31: C* Summit 2013: Distributed Graph Computing with Titan and Faunus by Matthias Broecheler

AURELIUS THINKAURELIUS.COM

We are Hiring