adding value through graph analysis using titan and faunus

51
AURELIUS THINKAURELIUS.COM Adding Value Through Graph Analysis Matthias Broecheler, CTO @mbroecheler March V, MMXIII KNOWLEDGE INFORMATION DATA

Upload: matthias-broecheler

Post on 15-Jan-2015

15.042 views

Category:

Technology


2 download

DESCRIPTION

In this presentation we discuss how graph analysis can add value to your data and how to use open source tools like Titan and Faunus to build scalable graph processing systems. This presentation gives an update on the development status of Titan and Faunus with a preview of what is to come.

TRANSCRIPT

Page 1: Adding Value through graph analysis using Titan and Faunus

AURELIUS THINKAURELIUS.COM

Adding Value Through Graph Analysis

Matthias Broecheler, CTO @mbroecheler March V, MMXIII

KNOWLEDGE INFORMATION DATA

Page 2: Adding Value through graph analysis using Titan and Faunus

"

"

"

"

"

"

"

"

"

"

"

Communities of Interest

Finding Influencers

Understanding Behavior

Page 3: Adding Value through graph analysis using Titan and Faunus

"

"

"

"

"

"

"

"

"

"

"

Information Integration

Recommendation

Question Answering

Page 4: Adding Value through graph analysis using Titan and Faunus

"

"

"

"

"

"

"

"

"

"

"

Fraud Detection

Risk Analysis

Market Valuation

Page 5: Adding Value through graph analysis using Titan and Faunus

Data

Information

Knowledge

Val

ue

Page 6: Adding Value through graph analysis using Titan and Faunus

Data

Information

Knowledge

2013-03-03 18:52:48:112; 12.123.211.192; ACCESS/TRR; http://adserve.domain.com/render.cgi?uid=F32282DA39B&flagtru&xls=trending ; ACTION=CLICK|DELAY=250|x=450|y=632!

"

"

userid:3552

addid:9914 clicked timestamp: 93932342

likes(Jane Joe, cute mamals):0.8

Page 7: Adding Value through graph analysis using Titan and Faunus

Data

Information

Knowledge

2013-03-03 18:52:48:112; 12.123.211.192; ACCESS/TRR; http://adserve.domain.com/render.cgi?uid=F32282DA39B&flagtru&xls=trending ; ACTION=CLICK|DELAY=250|x=450|y=632!

"

"

userid:3552

addid:9914 clicked timestamp: 93932342

likes(Jane Joe, cute mamals):0.8

Graph Databases &

Graph Analysis

Page 8: Adding Value through graph analysis using Titan and Faunus
Page 9: Adding Value through graph analysis using Titan and Faunus
Page 10: Adding Value through graph analysis using Titan and Faunus
Page 11: Adding Value through graph analysis using Titan and Faunus

AURELIUS THINKAURELIUS.COM

I Graph Foundation

Page 12: Adding Value through graph analysis using Titan and Faunus

Graph

name: Jupiter type: god

name: Pluto type: god

name: Neptune type: god

name: Hercules type: demigod

name: Cerberus type: monster

name: Alcmene type: god

name: Saturn type: titan

Vertex Property

Page 13: Adding Value through graph analysis using Titan and Faunus

Graph

name: Jupiter type: god

name: Pluto type: god

name: Neptune type: god

name: Hercules type: demigod

name: Cerberus type: monster

name: Alcmene type: god

name: Saturn type: titan

father father

mother brother

brother battled

pet

time:12

Edge

Edge Property

Edge Type

Page 14: Adding Value through graph analysis using Titan and Faunus

Path

name: Jupiter type: god

name: Pluto type: god

name: Neptune type: god

name: Hercules type: demigod

name: Cerberus type: monster

name: Alcmene type: god

name: Saturn type: titan

father father

mother brother

brother battled

pet

time:12

Page 15: Adding Value through graph analysis using Titan and Faunus

Degree

name: Jupiter type: god

name: Pluto type: god

name: Neptune type: god

name: Hercules type: demigod

name: Cerberus type: monster

name: Alcmene type: god

name: Saturn type: titan

father father

mother brother

brother battled

pet

time:12

Page 16: Adding Value through graph analysis using Titan and Faunus

Aurelius Graph Cluster

Stores a massive-scale property graph allowing real-time traversals and updates

Batch processing of large graphs with Hadoop

Runs global graph algorithms on large, compressed,

in-memory graphs

Map/Reduce

Analysis results back into Titan

Apache 2

TITAN FAUNUS FULGORA

Bulk Load

Load

Page 17: Adding Value through graph analysis using Titan and Faunus

AURELIUS THINKAURELIUS.COM

II Titan Graph Database

Page 18: Adding Value through graph analysis using Titan and Faunus

  Numerous Concurrent Users   Many Short Transactions

  read/write

  Real-time Traversals (OLTP)   High Availability   Dynamic Scalability   Variable Consistency Model

  ACID or eventual consistency

 Real-time Big Graph Data

Titan Features

Page 19: Adding Value through graph analysis using Titan and Faunus

Storage Backends

Partitionability

Availability Consistency

Page 20: Adding Value through graph analysis using Titan and Faunus

$ ./titan-0.2.0/bin/gremlin.sh! ! ! !\,,,/! (o o)!-----oOOo-(_)-oOOo-----!gremlin> g = TitanFactory.open('/tmp/titan')!==>titangraph[local:/tmp/titan]!gremlin> v = g.V(‘name’,’Hercules’)!==>v[4]!gremlin> v.out(‘father’).out(‘brother’).name!

Page 21: Adding Value through graph analysis using Titan and Faunus

name: Jupiter type: god

name: Pluto type: god

name: Neptune type: god

name: Hercules type: demigod

name: Cerberus type: monster

name: Alcmene type: god

name: Saturn type: titan

father father

mother brother

brother battled

pet

time:12

gremlin> v.out(‘father’).out(‘brother’).name!

Page 22: Adding Value through graph analysis using Titan and Faunus

Vertex-Centric Indices

  Sort and index edges per vertex by primary key   Primary key can be composite

  Enables efficient focused traversals   Only retrieve edges that matter

  Uses push down predicates for quick, index-driven retrieval

Page 23: Adding Value through graph analysis using Titan and Faunus

v

time: 1

fought fought father

mother

battled battled battled

battled

time: 3 time: 5

time: 9 v.query()!

Page 24: Adding Value through graph analysis using Titan and Faunus

v

time: 1

father

mother

battled battled battled

battled

time: 3 time: 5

time: 9 v.query()! .direction(OUT)!

Page 25: Adding Value through graph analysis using Titan and Faunus

v

time: 1

battled battled battled

battled

time: 3 time: 5

time: 9 v.query()! .direction(OUT)! .labels(‘battled’)!

Page 26: Adding Value through graph analysis using Titan and Faunus

v

time: 1

battled battled

time: 3

v.query()! .direction(OUT)! .labels(‘battled’)! .has(‘time,T.lt,5)!

Page 27: Adding Value through graph analysis using Titan and Faunus

Titan Features

I.  Data Management

II. Vertex-Centric Indices

Page 28: Adding Value through graph analysis using Titan and Faunus

Titan Features

III.  Graph Partitioning

IV.  Edge Compression

Page 29: Adding Value through graph analysis using Titan and Faunus

AURELIUS THINKAURELIUS.COM

III TITAN 0.3.0 [-SNAPSHOT]

Page 30: Adding Value through graph analysis using Titan and Faunus

Titan Embedding

  Rexster RexPro   lightweight Gremlin

Server   binary protocol

  Titan Gremlin Engine   Embedded Storage

Backend   in-JVM method calls

  Native clients   Java, Python, Clojure

Page 31: Adding Value through graph analysis using Titan and Faunus

Graph Indexing

  Vertex and Edge indexing

  Pluggable index provider   ElasticSearch

  Lucene

  Full-text search

  Numeric range search

  Geographic search

Page 32: Adding Value through graph analysis using Titan and Faunus

name: Jupiter age: 4800 title: God of the heaven and skies

name: Pluto age: 4900 title: God of the underworld

name: Neptune age: 5200 title: God of the earth and ocean

name: Hercules title: Divine hero

name: Cerberus title: Ugly beast of the underworld

name: Alcmene age: 3300

name: Saturn age: 5900

father father

mother brother

brother

battled

pet

time:12 location: (38.071,23.745)

Page 33: Adding Value through graph analysis using Titan and Faunus

name: Jupiter age: 4800 title: God of the heaven and skies

name: Pluto age: 4900 title: God of the underworld

name: Neptune age: 5200 title: God of the earth and ocean

name: Hercules title: Divine hero

name: Cerberus title: Ugly beast of the underworld

name: Alcmene age: 3300

name: Saturn age: 5900

father father

mother brother

brother

battled

pet

time:12 location: (38.071,23.745)

g.query().has(‘age’,Cmp.GREATER_THAN,5000).vertices()!

Page 34: Adding Value through graph analysis using Titan and Faunus

name: Jupiter age: 4800 title: God of the heaven and skies

name: Pluto age: 4900 title: God of the underworld

name: Neptune age: 5200 title: God of the earth and ocean

name: Hercules title: Divine hero

name: Cerberus title: Ugly beast of the underworld

name: Alcmene age: 3300

name: Saturn age: 5900

father father

mother brother

brother

battled

pet

time:12 location: (38.071,23.745)

g.query().has(‘title’,Txt.CONTAINS,’god’).vertices()!

Page 35: Adding Value through graph analysis using Titan and Faunus

name: Jupiter age: 4800 title: God of the heaven and skies

name: Pluto age: 4900 title: God of the underworld

name: Neptune age: 5200 title: God of the earth and ocean

name: Hercules title: Divine hero

name: Cerberus title: Ugly beast of the underworld

name: Alcmene age: 3300

name: Saturn age: 5900

father father

mother brother

brother

battled

pet

time:12 location: (38.071,23.745)

g.query().has(‘age’,Cmp.GREATER_THAN,5000) has(‘title’,Txt.CONTAINS,’god’).vertices()!

Page 36: Adding Value through graph analysis using Titan and Faunus

name: Jupiter age: 4800 title: God of the heaven and skies

name: Pluto age: 4900 title: God of the underworld

name: Neptune age: 5200 title: God of the earth and ocean

name: Hercules title: Divine hero

name: Cerberus title: Ugly beast of the underworld

name: Alcmene age: 3300

name: Saturn age: 5900

father father

mother brother

brother

battled

pet

time:12 location: (38.071,23.745)

g.query().has(‘location’,Geo.WITHIN, Geoshape.circle(38,23,100).edges()!

Page 37: Adding Value through graph analysis using Titan and Faunus

AURELIUS THINKAURELIUS.COM

IV Faunus Graph Analytics

Page 38: Adding Value through graph analysis using Titan and Faunus

  Hadoop-based Graph Computing Framework

  Graph Analytics

  Breadth-first Traversals

  Global Graph Computations

 Batch Big Graph Data

Faunus Features

Page 39: Adding Value through graph analysis using Titan and Faunus

Faunus Architecture

g._()!

Page 40: Adding Value through graph analysis using Titan and Faunus

Faunus Work Flow

hdfs://user/ubuntu/

output/job-0/

output/job-1/

output/job-2/ { graph*

sideeffect*

g.V.out .out .count()

Compressed HDFS Graphs   stored in sequence files   variable length encoding   prefix compression

Page 41: Adding Value through graph analysis using Titan and Faunus

Aurelius Graph Cluster

Stores a massive-scale property graph allowing real-time traversals and updates

Batch processing of large graphs with Hadoop

Runs global graph algorithms on large, compressed,

in-memory graphs

Map/Reduce

Analysis results back into Titan

Apache 2

TITAN FAUNUS FULGORA

Bulk Load

Load

Page 42: Adding Value through graph analysis using Titan and Faunus

What’s New

  Faunus 0.1 released

  Bulk Import / Export for Titan   loaded graph into Titan

  loading derivations into Titan

  RDF support

  Many optimizations   vertex compression

Page 43: Adding Value through graph analysis using Titan and Faunus

Faunus Setup

$ bin/gremlin.sh !

\,,,/! (o o)!-----oOOo-(_)-oOOo-----!gremlin> g = FaunusFactory.open('bin/titan-hbase.properties')!==>faunusgraph[titanhbaseinputformat]!gremlin> g.getProperties()!==>faunus.graph.input.format=com.thinkaurelius.faunus.formats.titan.hbase.TitanHBaseInputFormat!==>faunus.graph.output.format=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat!==>faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat!==>faunus.output.location=dbpedia!==>faunus.output.location.overwrite=true!

gremlin> g._() !12/11/09 15:17:45 INFO mapreduce.FaunusCompiler: Compiled to 1 MapReduce job(s)!12/11/09 15:17:45 INFO mapreduce.FaunusCompiler: Executing job 1 out of 1: MapSequence[com.thinkaurelius.faunus.mapreduce.transform.IdentityMap.Map]!12/11/09 15:17:50 INFO mapred.JobClient: Running job: job_201211081058_0003!

Page 44: Adding Value through graph analysis using Titan and Faunus

Build a Knowledge Graph

  Based on DBPedia   Graph version of Wikipedia   ~290 million edges (~1B triples)

1.  Bulk load RDF into Faunus   6 m1.xlarge

2.  Convert to property graph 3.  Bulk load into Titan

  3 m1.xlarge with Cassandra

4.  OLTP+OLAP   Total Time: ~ 2 hours

Page 45: Adding Value through graph analysis using Titan and Faunus

gremlin> g = TitanFactory.open('bin/cassandra.local') !==>titangraph[cassandrathrift:10.176.213.110]!

gremlin> g.V('name','Random_walker_algorithm').both.name!==>Random_walk!==>Segmentation_(image_processing)!==>Graph_(mathematics)!==>Laplacian_matrix!==>Graph!==>Laplacian_matrix!==>Electrical_network!==>Resistor!==>Electrical_resistance_and_conductance!==>Ground_(electricity)!==>Direct_current!==>Voltage_source!==>Precomputation!==>Category:Computer_vision!==>Random_Walker_(Computer_Vision)!==>List_of_algorithms!==>Segmentation_(image_processing)!==>Watershed_(image_processing)!==>Random_walker_(computer_vision)!==>Random_Walker_(computer_vision)!

Graph OLTP

Page 46: Adding Value through graph analysis using Titan and Faunus

gremlin> g.V('name','Learning').out.out.out.out[0..10].name !==>Latium!==>Roman_Kingdom!==>Roman_Republic!==>Roman_Empire!==>Middle_Ages!==>Early_modern_Europe!==>Armenian_Kingdom_of_Cilicia!==>Lingua_franca!==>Vatican_City!==>Vulgar_Latin!==>Romance_languages!

Page 47: Adding Value through graph analysis using Titan and Faunus

Aurelius Graph Cluster

Stores a massive-scale property graph allowing real-time traversals and updates

Batch processing of large graphs with Hadoop

Runs global graph algorithms on large, compressed,

in-memory graphs

Map/Reduce

Analysis results back into Titan

Apache 2

TITAN FAUNUS FULGORA

Bulk Load

Load

[email protected]

Page 48: Adding Value through graph analysis using Titan and Faunus

The Graph Landscape Sp

eed

of T

rave

rsal

/Pro

cess

Size of Graph Illustration only, not to scale

Page 49: Adding Value through graph analysis using Titan and Faunus

TINKERPOP.COM

Page 50: Adding Value through graph analysis using Titan and Faunus

AURELIUS THINKAURELIUS.COM

Thanks!

Vadas Gintautas @vadasg

Marko Rodriguez @twarko

Stephen Mallette @spmallette

Daniel LaRocque

Page 51: Adding Value through graph analysis using Titan and Faunus

AURELIUS THINKAURELIUS.COM

We are Hiring