progress report - year 2 extensions of the phd symposium presentation daniel mcennis
Embed Size (px)
DESCRIPTION
Current Data 40’s Jazz Recordings 2000 annotated recordings from 80 CDs Covers nearly all 40’s popular music LastFM by Song Retrieves tag and user info by song Data cleaning on user playcounts neededTRANSCRIPT

Progress Report - Year 2
Extensions of the PhD Symposium Presentation
Daniel McEnnis

Overview
Accomplishments Data set acquisition and cleaning Theoretical achievements Graph-RAT improvements

Current Data
40’s Jazz Recordings 2000 annotated recordings from 80 CDs Covers nearly all 40’s popular music
LastFM by Song Retrieves tag and user info by song Data cleaning on user playcounts needed

Planned Data Set Acquisition
Explored DBTunes XML version of myspace.
Linking with LastFM data designed but not yet written.
Provides per-artist audio data for all recent artists.

Theoretical Achievements
Algorithm Literature ReviewTheortical Computer Science journal
submissionNZCSRSC conference submissionRecommendation Tasks and Evaluation
Metrics

Algorithm Literature
Systematic exploration of theoretical computer science and discrete mathematics.
Discovered 1973 SIAM paper for maximal clique algorithm.
Maximal clique algorithm is most efficient discovered

Journal Submission
Submitted Graph Triples Census algorithm. Proof of correctness Proof of Time complexity Proof of Space Complexity
Rediscovery of 2001 algorithm in Social Networks
Most efficient implementation known

NZCSRSC
Poster at the conferenceWritten as a short users guide

Evaluation Exploration
Incorporating cross-validation into relational data.
9 types of music recommendation Personalized versus generic Open query versus targeted query Dynamic versus static data New music versus all music

Personalized Radio
Open query with personalized presentation
Static data vs dynamic dataNew items prediction vs predict
anything

Targeted Search
Not personalizedSimilarity queriesAutomatically generating targeted lists
for a browsing hierarchyNew music vs all musicStatic vs dynamic data

Personalized Tag Radio
Create a personalized play list matching a given query
New music vs all musicStatic vs dynamic data

Excluded Types
‘Top 40’ predictionRendered obsolete by other types

Cross-Validation in Graphs
Actor removal Only form currently used All links to a particular actor are removed
Link removal Selected links from ground truth are
removed Algorithm evaluated on reproducing
missing links

Graph-RAT Improvements
Release of 0.4.4 Finalized Graph-RAT as a relational
programming language Added propositional algorithms
Release of 0.5.0 New Query Subsystem Usability enhancements Space complexity improvements

Aggregators
8 algorithms with 9 helper functionsCover each form of propositionalizationCover mappings between links and
propertiesCore primitives for Graph-RAT as a
programming language.

Similarity
2 new similarity algorithms1 new distance metric

Query Subsystem
28 primitives for searching in a graph 10 graph primitives 7 actor primitives 7 link primitives 4 property primitives
Functional - composition to build queries

Performance Specs
Queries can return collections or iterators.
Collections Implemented as references into graphs Linear in number of references
Iterators Ordered sequences of objects Constant in space complexity (excluding
Graph ID and AllGraphs)

Usability Enhancements
Properties and MetadataInterface enhancementsDynamic Loading of ClassesXML scripting support

Properties and Metadata
Properties description Encapsulates all parameter code Utilizes Graph-RAT Property objects Comparison to JavaBeans
New Metadata Model Parameter model update Input/Output descriptors update

Interface Updates
Arrays->Lists graph, link, actor, and property objects
Iterators All graph operations support iterators

Dynamic Loading
Classes loaded from file at runtime.Loading controlled by call to loader
objectAutomatic registering with relevant
factoriesAll factories updated to support dynamic
loading Extend Abstract Factory

XML Scripting support
SAX parser support for all components excepting crawling and parsing
Implemented using the Builder pattern

Core Improvements
2 cross-validation algorithms~20 algorithm with space complexity
improvementsIterators for all graph primitivesMacros for separation of graph data by
cross-validation property.

Additional algorithms
2 new similarity algorithms
1 new distance metric added
Obsolete algorithms removed

LastFM crawler updates
LastFM upgraded its web-services, removing the old version
New version will link to the semantic web
~20 parsers completedStill under construction

Planned Future Work
Contingent on arrival of computerTesting of existing codeCross-Validation SchedulerCompletion of LastFM ParserDBTunes (from semantic web) parserExperiments!Write Thesis!

Unplanned Future Work
Full semantic web crawlerIncorporating GData protocolsDatabase backendColt-Matrix-Over-Graph adapterDatabase-backed Weka instance

Beyond the Horizon
Support for Prolog primitivesMulti-database graph supportSemantic Web graph utilizing the proxy
patternSupport for dynamic updates and
dynamic data