graph mining: project presentation - hasso plattner …...graph mining ws 2017 project rules 7 the...

35
Graph Mining: Project presentation Graph Mining course Winter Semester 2017 Davide Mottin, Anton Tsitsulin Hasso Plattner Institute

Upload: others

Post on 12-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Graph Mining: Project presentation - Hasso Plattner …...GRAPH MINING WS 2017 Project rules 7 The project can be done in groups of 2 or 3 people All projects must be different The

Graph Mining: Project presentation

Graph Mining course Winter Semester 2017

DavideMottin,AntonTsitsulinHassoPlattnerInstitute

Page 2: Graph Mining: Project presentation - Hasso Plattner …...GRAPH MINING WS 2017 Project rules 7 The project can be done in groups of 2 or 3 people All projects must be different The

GRAPH MINING WS 2017

Lecture road

2

Projectdescription

Datasets

Toolsandlibraries

Page 3: Graph Mining: Project presentation - Hasso Plattner …...GRAPH MINING WS 2017 Project rules 7 The project can be done in groups of 2 or 3 people All projects must be different The

GRAPH MINING WS 2017

Lecture road

3

Projectdescription

Datasets

Toolsandlibraries

Page 4: Graph Mining: Project presentation - Hasso Plattner …...GRAPH MINING WS 2017 Project rules 7 The project can be done in groups of 2 or 3 people All projects must be different The

GRAPH MINING WS 2017

What is the project about?

▪TheprojectisananalysisofaSIGNIFICANTLYlargenetwork(>10knodes/edges)▪Chooseonedataset,convertintoagraphandperformananalysis(predictions,statistics,communities,…)tohighlightcertainaspectsofthedata▪Choosetheanalysesyouwanttoperformandhighlightandthemethodsyouwanttouse(e.g.,correlationswithdegrees,labelpropagation,…)▪Wewillprovidesomelinkstodatasetsandavailablealgorithms,butyouarewelcometochooseothersaswell▪Youshouldevaluatetheperformanceoftheapproachanditsresults,reportyourfindings,showwhytheanalysisisusefulandproposeideas/directionstoimproveit

4

Page 5: Graph Mining: Project presentation - Hasso Plattner …...GRAPH MINING WS 2017 Project rules 7 The project can be done in groups of 2 or 3 people All projects must be different The

GRAPH MINING WS 2017

Algorithms: five options

5

1. Requestthecodefromtheauthorsofapaperyoulikeorimplementthetechniquebyyourself(becareful,itrequiresalotoftime!)

2. ImplementaSIMPLEalgorithmforaspecifictaskbyyourself3. Useoneofthetoolswepresent(orsimilartoolforgraph

analysis)4. Comparetwoormoreexistingtechniquesinatleastone

dimension(efficiency,effectiveness,scalability,accuracy,specificlimitations/characteristics,maximumdatasizetheycanhandle,...)

5. LookupGitHubsearch;)

Page 6: Graph Mining: Project presentation - Hasso Plattner …...GRAPH MINING WS 2017 Project rules 7 The project can be done in groups of 2 or 3 people All projects must be different The

GRAPH MINING WS 2017

Project’s general goal

6

▪ Themainideaistohavetheinsightsofthenetworkwithaspecifictechnique,retrieveinterestingfacts,andcriticallyanalyzethealgorithm’sperformanceandresults▪ Startbysomehypothesisonthedata,forinstance:• peopleinBerlintendtolikeelectronicmusic• fantasymoviesareratedhigheramongyoungpeople

• Chessplayerswiththesamescorewillplaytogethermoreoften

▪ Theprojectshouldbefunandgiveyouanideaofthepropertiesofthenetworkandthealgorithm(s)ofchoice!▪ Youcanchangethealgorithmtofitspecificpurposes▪ Theanalysisshouldnotbetrivialbutalsonottoosophisticated

Page 7: Graph Mining: Project presentation - Hasso Plattner …...GRAPH MINING WS 2017 Project rules 7 The project can be done in groups of 2 or 3 people All projects must be different The

GRAPH MINING WS 2017

Project rules

7

▪ Theprojectcanbedoneingroupsof2or3people▪ Allprojectsmustbedifferent▪ Thehandoutisareportofmax3-5pageswithasetofstatisticsandinsights

▪ Presentationoftheresultsmustbeunderstandableandconclusive(visualexamples,differentusecases)• YouMUSTvisualizesomeimportantpartsofthegraph:for

nodeclassificationyoucanshowdifferentclassifiednodes,forgraphqueryinghowthealgorithmbehaveswithparticularpatterns(stars,triangles,cores...),..

Page 8: Graph Mining: Project presentation - Hasso Plattner …...GRAPH MINING WS 2017 Project rules 7 The project can be done in groups of 2 or 3 people All projects must be different The

GRAPH MINING WS 2017

Report rules

▪UsethefollowingLaTextemplate:tiny.cc/GMREP▪ThereportmustbehandedoveronFeb16th,noextensionisallowed▪ThereportshouldbesentviaemailinasinglePDFfileintheformat:[Last-name-1]-[Last-name-2]-report.pdf

8

Page 9: Graph Mining: Project presentation - Hasso Plattner …...GRAPH MINING WS 2017 Project rules 7 The project can be done in groups of 2 or 3 people All projects must be different The

GRAPH MINING WS 2017

Report content (more details in the tex file)

1. Maincharacteristicsofthedataset(s)2. Datapreprocessingstepsandstoringchoices3. Descriptionofthemethodologyandthekindofanalysis4. Presentationoftheresults1. Showusefulnessandnoveltyoftheresults2. Comparetheruntimeandthescalabilityoftheusedmethod

5. Commentonthelimitationsoftheusedmethod

9

Page 10: Graph Mining: Project presentation - Hasso Plattner …...GRAPH MINING WS 2017 Project rules 7 The project can be done in groups of 2 or 3 people All projects must be different The

GRAPH MINING WS 2017

Important dates

▪22/11:Groupsandpreliminaryideasondata▪20/12:Informalfeedbackontheprojects(noevaluation)▪31/01-07/02:Projectin-classpresentation▪16/02:Reporthandover(viaemail)

10

Page 11: Graph Mining: Project presentation - Hasso Plattner …...GRAPH MINING WS 2017 Project rules 7 The project can be done in groups of 2 or 3 people All projects must be different The

GRAPH MINING WS 2017

Community Detection: Analysis of Political communities in Reddit

11

▪Data•Redditisasocialnewswebsite•Discussionsareorganizedingroups(subreddits)•234Millionusers,>100ksubreddits• Theprojectanalyses2014politicssubreddit

▪Preprocessing•Nodesareusers,edgesareinteractionsamongthem•Keeponlyconnectedcomponents• Twodifferentsubsets

▪Algorithms•ModularityoptimizationwithGirvan-Newmann• Spectralclustering

▪Networkanalysis•Peoplewithsimilarpoliticalthoughtsclustertogether

Preprocess the data in a smart way

Page 12: Graph Mining: Project presentation - Hasso Plattner …...GRAPH MINING WS 2017 Project rules 7 The project can be done in groups of 2 or 3 people All projects must be different The

GRAPH MINING WS 2017

Node Classification: League of Legends

12

▪Data• LeagueofLegendsisaMultiplayerOnlineBattleArena(MOBA)• 100Millionplayers/month• 5playerscompeteeachotherineverybattle• Theprojectpredictsuserskillsandclassificationofchampionroles

▪ Preprocessing• Nodesareplayers,edgesarebattlesorheroselections• 180.000matches,6patches• Everypatchchangesthedynamicsofthegame

▪Algorithms• Spectralclusteringonnormalizedadjacency• Clusteringcoefficientalgorithm+severalvariants

▪Networkanalysis• Hypothesis:Clusterrepresentrolesinthegame,sinceeveryheroisusuallyassignedtothesamerole.

Try small variations of algorithms (e.g. weighted)!

Page 13: Graph Mining: Project presentation - Hasso Plattner …...GRAPH MINING WS 2017 Project rules 7 The project can be done in groups of 2 or 3 people All projects must be different The

GRAPH MINING WS 2017

Other good example projects

▪ TemporalAnalysisofResilienceintheUSAirlineNetwork

▪ StructureandEvolutionofBitcoinTransactionNetwork

▪ PlayerRankingsandOutcomePredictioninTournaments

▪ LeagueofLegends:TheMetaasaFunctionofTimeandRank

▪ PredictingRevertsinWikiGraphProperties

▪ TheInterplayofMoneyandPoliticsintheUnitedStatesHouseofRepresentatives

▪ VideoGameCommunityAnalysis-DestinyClanWars

▪ IdentifyingEfficientLearningPaths

▪ PredictingNewRatingsandCreatingBusinessRecommendationsonYelp

▪ TheTimetheyarea-Changin':EvolvingCommunitiesinaMusicianNetwork

▪ DetectingandAnalyzingPoliticalCommunitiesinPolticsSub-RedditComments

▪ LinkPredictionforPinterestBoardRecommendation

▪ CommunityDetectionandNetworkAnalysisonRandomActsofPizza

▪ UnderstandingDistanceEffectsinGlobalFoodTradeThroughNetworkAnalysis

▪ PredictingStockMovementsUsingMarketCorrelationNetworks

▪ ArrestCharges’(Non)-IndependenceinBlackandWhiteMen

▪ HotorNot?FindingStarProjectsinCodingSocialNetwork

13

Page 14: Graph Mining: Project presentation - Hasso Plattner …...GRAPH MINING WS 2017 Project rules 7 The project can be done in groups of 2 or 3 people All projects must be different The

GRAPH MINING WS 2017

Lecture road

14

Projectdescription

Datasets

Toolsandlibraries

Page 15: Graph Mining: Project presentation - Hasso Plattner …...GRAPH MINING WS 2017 Project rules 7 The project can be done in groups of 2 or 3 people All projects must be different The

GRAPH MINING WS 2017

Datasets

▪Commonsources:•Wikipedia,Wikidata,…•Academictorrents• /r/datasets• govdata.de,daten.berlin.de–GermanyOpenDataLaw•ArchiveTeam• SNAP

▪Wecanprovidesomedatasets/samples▪Preprocessingshouldnoteatupallyourprojecttime

• Seekforpartiallypreprocesseddata

▪Don’tbeafraidtosubsample!

15

Page 16: Graph Mining: Project presentation - Hasso Plattner …...GRAPH MINING WS 2017 Project rules 7 The project can be done in groups of 2 or 3 people All projects must be different The

GRAPH MINING WS 2017

Wikipedia

▪Motherofmanygraphminingprojects•Manytypesoflinks,richdata•Completehistoryisavailable

▪Wikidataoffersfreepreprocessing•Automaticallyextractslinksfromwikiindifferentlanguages

▪Don’tforgetotherprojects:•Wikiquote

16

Page 17: Graph Mining: Project presentation - Hasso Plattner …...GRAPH MINING WS 2017 Project rules 7 The project can be done in groups of 2 or 3 people All projects must be different The

GRAPH MINING WS 2017

Wikipedia: example projects

▪Hiddeninfluencersinhumanhistory•Person-to-persongraphfromwikidata•Compareinfluencersfromgraphperspectivetoarticlelengths

▪TwinTowns(Sistercities)•BerlinisasistercityofMoscow,London,andWindhoek(Namibia)• Isthereanystructureinhowtheyarecreated?Communities?

▪Warnetworkbetweencountries• FranceandEnglandarearch-nemesis• Italyandliterallyanycountryaround•Canwedetectmore?

17

Page 18: Graph Mining: Project presentation - Hasso Plattner …...GRAPH MINING WS 2017 Project rules 7 The project can be done in groups of 2 or 3 people All projects must be different The

GRAPH MINING WS 2017

Chess networks

▪ lichess.org▪FIDEratedgames

▪Examplequestions:•Areplayersofsimilarratingsgroupedtogether?•Doplayerslearnopeningsfromeachother?•Howconnectedarethelow-ratedplayers?•Dochessopeningsform“playertypes”?

18

Page 19: Graph Mining: Project presentation - Hasso Plattner …...GRAPH MINING WS 2017 Project rules 7 The project can be done in groups of 2 or 3 people All projects must be different The

GRAPH MINING WS 2017

DoTA networks

▪YetanotherMOBAgame▪1.1billionmatchesavailable

▪Exampleanalyses:•Howdonewheroesaffectthegame•Howtheherocentralitychangesovertime•…youareprobablybetterexperts

19

Page 20: Graph Mining: Project presentation - Hasso Plattner …...GRAPH MINING WS 2017 Project rules 7 The project can be done in groups of 2 or 3 people All projects must be different The

GRAPH MINING WS 2017

Lecture road

20

Projectdescription

Datasetsfortheproject

Toolsandlibraries

Page 21: Graph Mining: Project presentation - Hasso Plattner …...GRAPH MINING WS 2017 Project rules 7 The project can be done in groups of 2 or 3 people All projects must be different The

GRAPH MINING WS 2017

Graph data repositories

21

▪ https://snap.stanford.edu/data/▪ http://konect.uni-koblenz.de/▪ http://irl.cs.ucla.edu/topology/#data▪ http://networkrepository.com/▪ https://networkdata.ics.uci.edu/index.php▪ http://nexus.igraph.org/api/dataset_info▪ http://www.geonames.org/export/▪ https://github.com/caesar0301/awesome-public-datasets▪ http://www.icwsm.org/2016/datasets/datasets/▪ https://datahub.io/dataset?tags=ontology&_tags_limit=0

Page 22: Graph Mining: Project presentation - Hasso Plattner …...GRAPH MINING WS 2017 Project rules 7 The project can be done in groups of 2 or 3 people All projects must be different The

GRAPH MINING WS 2017

Graph indexing and graph querying

22

● Graphcreation,graphgeneration,computingstructuralproperties,visualization○ iGraph(R,Python,C)○ Snappy(Python)○ Jung(Java)○ NetworkX(Python)

● GRAIL:reachabilityqueries● RQ-tree:Reliabilitysearchinuncertaingraphs

Page 23: Graph Mining: Project presentation - Hasso Plattner …...GRAPH MINING WS 2017 Project rules 7 The project can be done in groups of 2 or 3 people All projects must be different The

GRAPH MINING WS 2017

More features of the above-mentioned libraries

23

● iGraphR/Python/C,NetworkX:attribute-basedquerying● iGraphC:computingspanningtrees,clusteringcoefficient,

examininggraphisomorphism● Snappy:findingcommunities,computingnodecentrality● Jung:computingmaximumflow● NetworkX:longlistofalgorithms!

Page 24: Graph Mining: Project presentation - Hasso Plattner …...GRAPH MINING WS 2017 Project rules 7 The project can be done in groups of 2 or 3 people All projects must be different The

GRAPH MINING WS 2017

Frequent subgraph mining

24

• Gradoop(Java,Hadoop)• distributedgraphanalyticsincludingafrequentsubgraph

miningapproach• XifengYansoftwarepackages

• gSpan,frequentsubgraphminingalgorithm(Java/C++)• Gupta2014,approachforinterestingsubgraph

discoveryinsocialnetworks(Java)• MUSK:MarkovChainMonteCarlomethodtoguarantee

maximallyfrequentsubgraphstobesampled

Page 25: Graph Mining: Project presentation - Hasso Plattner …...GRAPH MINING WS 2017 Project rules 7 The project can be done in groups of 2 or 3 people All projects must be different The

GRAPH MINING WS 2017

Social network analysis

25

• Analysis• Snappy(Python):communitydetection,connected

componentcomputation,k-corecomputation,etc.• Analysisandvisualization

• Windowssoftwarepackages• Ucinet(text-book)• Pajek

• Othertools• Gephivisualizationtool:computationofsimplemetrics,

visualizingtimestamps,findingclusters• NodeXL:Microsoftexceltool

• MoretoolsforSNA

Page 26: Graph Mining: Project presentation - Hasso Plattner …...GRAPH MINING WS 2017 Project rules 7 The project can be done in groups of 2 or 3 people All projects must be different The

GRAPH MINING WS 2017

Multi-purpose libraries

26

● Pegasus(Java,Hadoop)○ parallelcomputationofnodedegree,PageRankscore,

connectedcomponents,randomwalkswithrestart● Gradoop(Java,Hadoop)

○ distributedgraphanalytics● Giraph(Java,Hadoop)

○ large-scalegraphprocessing

Page 27: Graph Mining: Project presentation - Hasso Plattner …...GRAPH MINING WS 2017 Project rules 7 The project can be done in groups of 2 or 3 people All projects must be different The

GRAPH MINING WS 2017

Multi-purpose libraries (2)

27

● NetworkX(Python)○ Algorithms:cliques,homophily,communities,clusters,link

prediction,etc.● GraphX(Spark,Scala)

○ parallelcomputationoftriangles,connectedcomponents,PageRankscore

● GraphDatabases○ neo4j(Cypher,Java):tutorialandfeatures○ sparksee(.Net,C++,Python,Objective-C,Java)

Page 28: Graph Mining: Project presentation - Hasso Plattner …...GRAPH MINING WS 2017 Project rules 7 The project can be done in groups of 2 or 3 people All projects must be different The

GRAPH MINING WS 2017

Multi-purpose libraries (3)

28

● Graph-tool(Python)○ graphstatistics,centralitymeasures,topological

algorithms,networkinference● jGraphT(Java)

○ graphtheoryalgorithms○ visualization

Page 29: Graph Mining: Project presentation - Hasso Plattner …...GRAPH MINING WS 2017 Project rules 7 The project can be done in groups of 2 or 3 people All projects must be different The

GRAPH MINING WS 2017

Diffusion models

29

● Netinf(SNAP)○ informationcascadetracking

● NDVT(R)○ R-basedtoolthatrendersdynamicnetworkdatafrom

'networkDynamic'objectsasmovies,interactiveanimations,orotherrepresentationsofchangingrelationalstructuresandattributes(documentation)

● EpiModel(R)○ R-basedtoolforsimulatingandanalyzingmathematical

modelsofinfectiousdisease(documentation)

Page 30: Graph Mining: Project presentation - Hasso Plattner …...GRAPH MINING WS 2017 Project rules 7 The project can be done in groups of 2 or 3 people All projects must be different The

GRAPH MINING WS 2017

Node classification and similarity

▪ FaBP• DanaiKoutra,PKDD2011:FastBeliefPropagationfornode

classificationingraphs

▪ LinBPandSSLH• Linearizedandimprovedsinglepassbeliefpropagationand

semi-supervisedwithhomophily

▪ Anyclassifier(suchasLinearRegressioninNumpy)

30

Page 31: Graph Mining: Project presentation - Hasso Plattner …...GRAPH MINING WS 2017 Project rules 7 The project can be done in groups of 2 or 3 people All projects must be different The

GRAPH MINING WS 2017

Link prediction

31

• NetworkX(Python)• algorithmlistincludingalinkpredictionapproach

• Githubprojects• LPmade(Java)

• manual• LinkPred(Python)

Page 32: Graph Mining: Project presentation - Hasso Plattner …...GRAPH MINING WS 2017 Project rules 7 The project can be done in groups of 2 or 3 people All projects must be different The

GRAPH MINING WS 2017

Graph summarization

32

● Vog(Python)○ DanaiKoutra,StatisticalAnalysisandDatamining2015:

SummarizingandUnderstandingLargeGraphs

Page 33: Graph Mining: Project presentation - Hasso Plattner …...GRAPH MINING WS 2017 Project rules 7 The project can be done in groups of 2 or 3 people All projects must be different The

GRAPH MINING WS 2017

Graph Embeddings

▪Deepwalk:https://github.com/xgfs/deepwalk-c▪Node2vec:https://github.com/aditya-grover/node2vec▪NEU:https://github.com/thunlp/NEU▪LINE:https://github.com/tangjianpku/LINE▪HOPE:https://github.com/AnryYang/HOPE

33

Page 34: Graph Mining: Project presentation - Hasso Plattner …...GRAPH MINING WS 2017 Project rules 7 The project can be done in groups of 2 or 3 people All projects must be different The

GRAPH MINING WS 2017

Data and algorithm selection

● Youarewelcometochoosethedatasetandalgorithm/toolyouprefer,evenoutsidethelist!

● Letusknowaboutyourdecisionbeforeyoubeginworkingonyouranalysis,sothatwecangiveyoufeedbackandhelpifnecessary:)

34

Page 35: Graph Mining: Project presentation - Hasso Plattner …...GRAPH MINING WS 2017 Project rules 7 The project can be done in groups of 2 or 3 people All projects must be different The

GRAPH MINING WS 2017

Time for discussion

35