Transcript
Page 1: Progress Report - Year 2 Extensions of the PhD Symposium Presentation Daniel McEnnis

Progress Report - Year 2

Extensions of the PhD Symposium Presentation

Daniel McEnnis

Page 2: Progress Report - Year 2 Extensions of the PhD Symposium Presentation Daniel McEnnis

Overview

Accomplishments Data set acquisition and cleaning Theoretical achievements Graph-RAT improvements

Page 3: Progress Report - Year 2 Extensions of the PhD Symposium Presentation Daniel McEnnis

Current Data

40’s Jazz Recordings 2000 annotated recordings from 80 CDs Covers nearly all 40’s popular music

LastFM by Song Retrieves tag and user info by song Data cleaning on user playcounts needed

Page 4: Progress Report - Year 2 Extensions of the PhD Symposium Presentation Daniel McEnnis

Planned Data Set Acquisition

Explored DBTunes XML version of myspace.

Linking with LastFM data designed but not yet written.

Provides per-artist audio data for all recent artists.

Page 5: Progress Report - Year 2 Extensions of the PhD Symposium Presentation Daniel McEnnis

Theoretical Achievements

Algorithm Literature ReviewTheortical Computer Science journal

submissionNZCSRSC conference submissionRecommendation Tasks and Evaluation

Metrics

Page 6: Progress Report - Year 2 Extensions of the PhD Symposium Presentation Daniel McEnnis

Algorithm Literature

Systematic exploration of theoretical computer science and discrete mathematics.

Discovered 1973 SIAM paper for maximal clique algorithm.

Maximal clique algorithm is most efficient discovered

Page 7: Progress Report - Year 2 Extensions of the PhD Symposium Presentation Daniel McEnnis

Journal Submission

Submitted Graph Triples Census algorithm. Proof of correctness Proof of Time complexity Proof of Space Complexity

Rediscovery of 2001 algorithm in Social Networks

Most efficient implementation known

Page 8: Progress Report - Year 2 Extensions of the PhD Symposium Presentation Daniel McEnnis

NZCSRSC

Poster at the conferenceWritten as a short users guide

Page 9: Progress Report - Year 2 Extensions of the PhD Symposium Presentation Daniel McEnnis

Evaluation Exploration

Incorporating cross-validation into relational data.

9 types of music recommendation Personalized versus generic Open query versus targeted query Dynamic versus static data New music versus all music

Page 10: Progress Report - Year 2 Extensions of the PhD Symposium Presentation Daniel McEnnis

Personalized Radio

Open query with personalized presentation

Static data vs dynamic dataNew items prediction vs predict

anything

Page 11: Progress Report - Year 2 Extensions of the PhD Symposium Presentation Daniel McEnnis

Targeted Search

Not personalizedSimilarity queriesAutomatically generating targeted lists

for a browsing hierarchyNew music vs all musicStatic vs dynamic data

Page 12: Progress Report - Year 2 Extensions of the PhD Symposium Presentation Daniel McEnnis

Personalized Tag Radio

Create a personalized play list matching a given query

New music vs all musicStatic vs dynamic data

Page 13: Progress Report - Year 2 Extensions of the PhD Symposium Presentation Daniel McEnnis

Excluded Types

‘Top 40’ predictionRendered obsolete by other types

Page 14: Progress Report - Year 2 Extensions of the PhD Symposium Presentation Daniel McEnnis

Cross-Validation in Graphs

Actor removal Only form currently used All links to a particular actor are removed

Link removal Selected links from ground truth are

removed Algorithm evaluated on reproducing

missing links

Page 15: Progress Report - Year 2 Extensions of the PhD Symposium Presentation Daniel McEnnis

Graph-RAT Improvements

Release of 0.4.4 Finalized Graph-RAT as a relational

programming language Added propositional algorithms

Release of 0.5.0 New Query Subsystem Usability enhancements Space complexity improvements

Page 16: Progress Report - Year 2 Extensions of the PhD Symposium Presentation Daniel McEnnis

Aggregators

8 algorithms with 9 helper functionsCover each form of propositionalizationCover mappings between links and

propertiesCore primitives for Graph-RAT as a

programming language.

Page 17: Progress Report - Year 2 Extensions of the PhD Symposium Presentation Daniel McEnnis

Similarity

2 new similarity algorithms1 new distance metric

Page 18: Progress Report - Year 2 Extensions of the PhD Symposium Presentation Daniel McEnnis

Query Subsystem

28 primitives for searching in a graph 10 graph primitives 7 actor primitives 7 link primitives 4 property primitives

Functional - composition to build queries

Page 19: Progress Report - Year 2 Extensions of the PhD Symposium Presentation Daniel McEnnis

Performance Specs

Queries can return collections or iterators.

Collections Implemented as references into graphs Linear in number of references

Iterators Ordered sequences of objects Constant in space complexity (excluding

Graph ID and AllGraphs)

Page 20: Progress Report - Year 2 Extensions of the PhD Symposium Presentation Daniel McEnnis

Usability Enhancements

Properties and MetadataInterface enhancementsDynamic Loading of ClassesXML scripting support

Page 21: Progress Report - Year 2 Extensions of the PhD Symposium Presentation Daniel McEnnis

Properties and Metadata

Properties description Encapsulates all parameter code Utilizes Graph-RAT Property objects Comparison to JavaBeans

New Metadata Model Parameter model update Input/Output descriptors update

Page 22: Progress Report - Year 2 Extensions of the PhD Symposium Presentation Daniel McEnnis

Interface Updates

Arrays->Lists graph, link, actor, and property objects

Iterators All graph operations support iterators

Page 23: Progress Report - Year 2 Extensions of the PhD Symposium Presentation Daniel McEnnis

Dynamic Loading

Classes loaded from file at runtime.Loading controlled by call to loader

objectAutomatic registering with relevant

factoriesAll factories updated to support dynamic

loading Extend Abstract Factory

Page 24: Progress Report - Year 2 Extensions of the PhD Symposium Presentation Daniel McEnnis

XML Scripting support

SAX parser support for all components excepting crawling and parsing

Implemented using the Builder pattern

Page 25: Progress Report - Year 2 Extensions of the PhD Symposium Presentation Daniel McEnnis

Core Improvements

2 cross-validation algorithms~20 algorithm with space complexity

improvementsIterators for all graph primitivesMacros for separation of graph data by

cross-validation property.

Page 26: Progress Report - Year 2 Extensions of the PhD Symposium Presentation Daniel McEnnis

Additional algorithms

2 new similarity algorithms

1 new distance metric added

Obsolete algorithms removed

Page 27: Progress Report - Year 2 Extensions of the PhD Symposium Presentation Daniel McEnnis

LastFM crawler updates

LastFM upgraded its web-services, removing the old version

New version will link to the semantic web

~20 parsers completedStill under construction

Page 28: Progress Report - Year 2 Extensions of the PhD Symposium Presentation Daniel McEnnis

Planned Future Work

Contingent on arrival of computerTesting of existing codeCross-Validation SchedulerCompletion of LastFM ParserDBTunes (from semantic web) parserExperiments!Write Thesis!

Page 29: Progress Report - Year 2 Extensions of the PhD Symposium Presentation Daniel McEnnis

Unplanned Future Work

Full semantic web crawlerIncorporating GData protocolsDatabase backendColt-Matrix-Over-Graph adapterDatabase-backed Weka instance

Page 30: Progress Report - Year 2 Extensions of the PhD Symposium Presentation Daniel McEnnis

Beyond the Horizon

Support for Prolog primitivesMulti-database graph supportSemantic Web graph utilizing the proxy

patternSupport for dynamic updates and

dynamic data


Top Related