graph-rat overview by daniel mcennis. 2/32 what is graph-rat relational analysis toolkit database...
TRANSCRIPT
Graph-RAT Overview
By
Daniel McEnnis
2/32
What is Graph-RAT
Relational Analysis Toolkit
Database abstraction layer
Evaluation platform
Robustly evaluate all different ways of performing recommendation
3/32
Kinds of Analysis
Recommendation Systems
Relational Machine Learning
Data Mining
MIR document retrieval
4/32
Talk Outline
Base Components Queries Algorithms Schedulers Graph-RAT Language Conclusion and Examples
5/32
Base Components
Graphs
Actors
Links
Properties
A
B
EC
D
AA
B
EC
D
AA
B
EC
D
A
[Vector]HikingBiking
22
JohnAName
Age
HobbiesLibrary
6/32
Properties
Variables of Graph-RAT Can be arbitrary Java types Can be attached to anything Unique ID string for each object Accessed only as sets, not as objects
7/32
Data View
Hyper-graph structure defined by the set of actors and links in a graph
Accessible from the enclosing graph Can be cyclic
A
B
EC
D
AA
B
EC
D
AA
B
EC
D
A
8/32
Metadata View
Not constructed by default Implicit graph described by modes and the
relations between them Needed for relational machine learning
User
Friend
9/32
Query Language
Constructs sets retrieved from a graph Functional structure Similar to SQL 4 types
Graph Queries Actor Queries Link Queries Property Queries
10/32
Query Structure
Cascading queries in a LISP style syntax
Each child query is of a different type
Restrictions can be added at runtime
11/32
Query Examples
LinkByActor( false, ActorByMode(false, “Target”,”.*”) ActorByMode(false, “Source”,”.*”) SetOperation.XOR)
12/32
Query Comparisons
Similar to the JENA interface
Construction is similar to Jung system
Implements all SQL queries that do not require temporary tables
13/32
0.4.3 Query
Uses graph primitives instead of Queries
Algorithms use hard-coded GraphByID
14/32
Algorithms
Functions that execute over a given graph
Metadata is a part of the algorithm
Properties utilized or created are declared up front.
Excepting output algorithms, no side effects are permitted.execute(Graph graph)
IODescriptor getInput()IODescriptor getOuput()
15/32
Propositional Algorithms
Utilizes aggregator function as a parameter Crosses all ways of shifting data
Aggregate By Link Aggregate By Link Property Aggregate On Graph Graph To Actor Link To Graph Graph To Graph
16/32
Aggregator Functions
1 or more elements to equal or fewer elements
Examples– Statistical Moments– Arithmetic Operations– Null Aggregation– Concatentation
17/32
Social Network Analysis Algorithms
Prestige Algorithms Degree Betweeness Closeness Page Rank HITS
Graph Triples
18/32
Classification Algorithms
Machine Learning Primitives Uses Weka Separate algorithms for training and
classifying
19/32
Clustering Algorithms
Several graph-based algorithms Weak Component Clustering Strong Component Clustering Edge Betweeness Clustering Norman-Girvan Edge Betweeness
Also has primitives calling Weka on vector data
20/32
Similarity Algorithms
Comparisons between modes Types of Similarity
– Similarity By Link– Similarity By Property– Graph Similarity
Distance Functions– All Weka distance functions– KLDistance– Exponential Distance
21/32
Collaborative Filtering Algorithms
Traditional recommendation algorithms Item to Item User to User Associative Mining
22/32
Array-Based Algorithms
Transform To Array Principal Component Analysis
23/32
Evaluation
All forms of evaluating results Set Based (precision and recall) Weighted Set (Correlations) Ordered Lists (Kendall Tau, Half Life)
Cross-Validation algorithms By Actor By Link By Graph
24/32
Data Acquisition
Components for acquiring source data File Reader Types
Reading different file formats
Web Crawling Types LiveJournal or LastFM
Connection Types Links different sets together
25/32
Web Crawler
Custom Multi-threaded web crawler Dynamic parsers Properties passing between both crawls and
parser execution Stop and filter conditions are parameterized
26/32
Existing Parsers
Base HTML parsing XML Parsing (SAX)
LiveJournal FOAF LastFM REST services Graph-RAT documents Yahoo search queries
27/32
Comparisons
SQL LINQ Matlab Other graph packages Prolog?
28/32
Embedded Use
Dynamic Loading AbstractFactory abstract superclass Example - Retrieving links to YouTube
videos from GData
29/32
Graph-RAT Language
Base Graph-RAT: Data Acquisition components executed For each algorithm entry:
Graph Query selects a set of graphs Algorithm is executed over each graph
Cross-Validation Graph-RAT Mode, relation, or graph chosen in advance, Data Acquisition components run once Algorithm entries rerun for each fold
Statistical Graph-RAT List of cross-validation schedulers Statistical metrics of which performed better
30/32
User To User Collaborative Filtering Example
Aggregate By Link(Artist->User) Similarity By Link (User->User) Aggregate By Link (User->User) Property to Link (User->Artist)
31/32
Setup Example
Setup Example
<Scheduler class=“BasicScheduler”><Graph>
<MemGraph/></Graph>
…</Scheduler>
33/32
DataAquisition
<DataAcquisition><Class>Crawl LastFM</Class><Name>Crawl LastFM</Name>
<MemGraph/><Property><Name>Proxy</Name>
<Value>proxy.waikato.ac.nz</Value></Property>
…</DataAquisition>
34/32
Query Entry
<Algorithm><Query>
<GraphByID><Pattern>.*</Pattern>
</GraphByID></Query>
</Algorithm>
Algorithm Entry
<Algorithm>
<Query>…</Query>
<Class>GraphTriples</Class>
<Name>Graph Triples</Name>
<Property><Name>Relation</Name>
<Value>Friends</Value>
</Property>
<Property><Name>Destination</Name>
<Value>TriplesVector</Value>
</Property>
…
</Algorithm>
36/32
Future Work
Stabilization - 0.5.1 to beta Statistical testing on result sets Upgrading the GUI interface Memory performance upgrades Octave Integration
37/32
Questions?
http://graph-rat.sourceforge.net Stable (beta) release is 0.4.3