graph processing with apache tinkerpop

32
Graph Processing with Apache TinkerPop (incubating) Jason Plurad Software Engineer, IBM | Committer, Apache TinkerPop

Upload: jason-plurad

Post on 14-Apr-2017

308 views

Category:

Software


9 download

TRANSCRIPT

Page 1: Graph Processing with Apache TinkerPop

Graph Processing withApache TinkerPop (incubating)

Jason PluradSoftware Engineer, IBM | Committer, Apache TinkerPop

Page 2: Graph Processing with Apache TinkerPop

• ProjectUpdate• GraphLandscape• AGraphProblem• Hands-OnGraph

http://tinkerpop.apache.org

Page 3: Graph Processing with Apache TinkerPop

AboutMe• Twitter@pluradj• GitHub@pluradj• Openchannels– TinkerPopmailinglists– Titanmailinglist– StackOverflow

Page 4: Graph Processing with Apache TinkerPop

(Apache)TinkerPop (incubating)• 2009:Inception• 2012:TinkerPop 2• 2015:ApacheIncubator• 2016:TopLevelProject?– TLPVOTEpassed!–WaitingonboardmeetingtoestablishTLP

Page 5: Graph Processing with Apache TinkerPop

Podling Releases

• 3.0– Majorrefactor,Java8lambdaexpressions,GremlinServer,OLAPgraphcomputers

• 3.1– Hadoop2support,persistedRDDs• 3.2– OLAPjobchaining,OLAPgraphfilters,

performanceimprovements

Page 6: Graph Processing with Apache TinkerPop

Commongraphdatadomains• SocialNetworkAnalysis• ConfigurationManagementDatabase• MasterDataManagement• RecommendationEngines• KnowledgeGraphs• InternetofThings

Page 7: Graph Processing with Apache TinkerPop

PropertyGraphandGremlin• Structure– Vertex– Edge– Properties

• Gremlin– Domainspecificlanguage(DSL)forgraph– Dataflow:forwardandbackward– TraversalSteps– Bindingsfornon-JVMlanguages

Page 8: Graph Processing with Apache TinkerPop

ApacheTinkerPopGraphComputingFramework

Page 9: Graph Processing with Apache TinkerPop

GraphLandscape• GraphdatabasevsGraphprocessor– OLTPvsOLAP– Neighborhoodvswholegraph

• Multi-model:nottheonlystoreinyourapp

Page 10: Graph Processing with Apache TinkerPop

IBM Graph (Beta)

• ManagedGraph-as-a-Service(OLTP)• Focusonyourdata,notinstallandoperations• #sleepMore

http://ibm.biz/IBMGraph

Page 11: Graph Processing with Apache TinkerPop

Whatisthis?module.exports = xxxxxxx;function xxxxxxx (str, len, ch) {str = String(str);var i = -1;if (!ch && ch !== 0) ch = ' ';len = len - str.length;while (++i < len) {str = ch + str;

}return str;

}

Page 12: Graph Processing with Apache TinkerPop

AGraphProblem:DependencyManagement

• OnMarch22,2016npm broketheInternet• Left-padwasunpublished– 11linesofcode– WTFPLlicense– Hundredsofbreakingbuildsperminute– http://blog.npmjs.org/post/141577284765/kik-left-pad-and-npm

• ArewesafewithApache?

Page 13: Graph Processing with Apache TinkerPop

Questionsforthegraph• Whichdependenciesareatrisk?• Whichonesshouldberefactoredtoavoid?• Riskfactors– Unsuitablelicense– Singledeveloper– Toolittlecode/Toomuchcode– Changestoofrequently/Codeisstagnant– Nobodyelseisusingit

Page 14: Graph Processing with Apache TinkerPop

Let’sgoforaride!

Page 15: Graph Processing with Apache TinkerPop

Titan(Aurelius)• PickagraphdatabaseforOLTP…– ApachelicensebutnotinASF

• Codehasstagnatedintheopen– DataStax Enterprise(DSE)Graph– Wideopenopportunities• GenesisGraphisupnext!• ApacheS2Graph(incubating)• ApacheFlink (Gelly)• ApacheSolr (GraphQuery)

Page 16: Graph Processing with Apache TinkerPop

ApacheSparkorApacheGiraph• PickagraphprocessorforOLAP…– Sparkisthenewhotness– Giraph isbettersuitedforgiganticgraphs

• ByusingApacheTinkerPop andGremlin,wecanuseeitheroneseamlessly

Page 17: Graph Processing with Apache TinkerPop

VagrantandVirtualbox• Developersdon’talwaysgetkeystothecloud• Virtualmachinestotherescue– Host:16GBRAMormore– 3-4VMswith3GBRAM

• Proveoutyourgraphalgorithmsonasmalldatasetbeforewastingtimeonabigdataset

Page 18: Graph Processing with Apache TinkerPop

ApacheAmbari• SimpleinstallforApacheHadoopandrelatedApachebigdatapackages– HDFS,YARN,MapReduce,HBase,Spark,etc

• Managementandmonitoringdashboard• Enablesintegrationofothersoftware

Page 19: Graph Processing with Apache TinkerPop

Gettingthedata• NPMregistryrunsonApacheCouchDB• ReplicationinApacheCouchDB isawesome– https://skimdb.npmjs.com/registry

Page 20: Graph Processing with Apache TinkerPop

Transformthedata• ApacheCouchDB isadocumentstore• Dependenciesaregraphdata• Otherthingscanbetoo– Users– Keywords– License

• Graphmodeldependsonthequestionsyouwanttoaskofthegraph

Page 21: Graph Processing with Apache TinkerPop

NPMGraphSchema

Document250K

Package1.5M

Keyword81K

License2K

Person125K

license

dependencydevDependency

Page 22: Graph Processing with Apache TinkerPop

Hands-On:GremlinConsole

https://asciinema.org/a/21qk1rn9yt6tt7sour9w9ynxn

Page 23: Graph Processing with Apache TinkerPop

TheGraphComputer

Page 24: Graph Processing with Apache TinkerPop

AnatomyofaVertexProgram• Vertex-centricgraphlogic• Parallelexecution(BSP)

Page 25: Graph Processing with Apache TinkerPop

OutoftheboxVertexPrograms• Traversal• BulkLoader• BulkDumper• PageRank• PeerPressure

Page 26: Graph Processing with Apache TinkerPop

Hands-On:GraphProgram

Page 27: Graph Processing with Apache TinkerPop

OLAP Traversal Sources> graph = GraphFactory.open('conf/npmgraph-olap.properties')> g = graph.traversal().withComputer(SparkGraphComputer)> g = graph.traversal().withComputer(GiraphGraphComputer)

Graph Statistics via TraversalVertexProgram> g.V().count() // vertex count> g.E().count() // edge count> g.V().label().groupCount() // vertex label distribution> g.E().label().groupCount() // edge label distribution> g.V().properties().key().groupCount() // vertex property distribution

Page 28: Graph Processing with Apache TinkerPop

Nextstop?Moredata!• Graphsareforconnectingdata!• ConsumedatafromGitHub– Userdata– Staticcodeanalysis– Codeusageanalysis

• ConsumedatafromTwitter– Trendingnews– Securityalerts

Page 29: Graph Processing with Apache TinkerPop

Summary

• ApacheTinkerPop isforgraphcomputing• OLTPvs OLAPisanimportantdistinction– Gremlinallowsyoutoseamlessbridgethetwo

• Graphthinkingisdifferentthanrelational– Isthefuturemulti-model?

• Manyopportunitiestoinnovateinthisspace

Page 30: Graph Processing with Apache TinkerPop

Acknowledgements• MarkoRodriguez

– Gremlin language,GremlinOLAP• Ketrina Yim

– Illustrator,creatorofGremlinandfriends• StephenMallette

– TinkerPop releasemanager,Gremlinapplications• DanielKuppitz

– Gremlin languageguru

• DavidRobinson– Bigdata,multi-model

architect/developer

Page 31: Graph Processing with Apache TinkerPop

Questions?

Page 32: Graph Processing with Apache TinkerPop

Thankyou!