progress report - year 2 extensions of the phd symposium presentation daniel mcennis

Post on 08-Jan-2018

225 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Current Data 40’s Jazz Recordings 2000 annotated recordings from 80 CDs Covers nearly all 40’s popular music LastFM by Song Retrieves tag and user info by song Data cleaning on user playcounts needed

TRANSCRIPT

Progress Report - Year 2

Extensions of the PhD Symposium Presentation

Daniel McEnnis

Overview

Accomplishments Data set acquisition and cleaning Theoretical achievements Graph-RAT improvements

Current Data

40’s Jazz Recordings 2000 annotated recordings from 80 CDs Covers nearly all 40’s popular music

LastFM by Song Retrieves tag and user info by song Data cleaning on user playcounts needed

Planned Data Set Acquisition

Explored DBTunes XML version of myspace.

Linking with LastFM data designed but not yet written.

Provides per-artist audio data for all recent artists.

Theoretical Achievements

Algorithm Literature ReviewTheortical Computer Science journal

submissionNZCSRSC conference submissionRecommendation Tasks and Evaluation

Metrics

Algorithm Literature

Systematic exploration of theoretical computer science and discrete mathematics.

Discovered 1973 SIAM paper for maximal clique algorithm.

Maximal clique algorithm is most efficient discovered

Journal Submission

Submitted Graph Triples Census algorithm. Proof of correctness Proof of Time complexity Proof of Space Complexity

Rediscovery of 2001 algorithm in Social Networks

Most efficient implementation known

NZCSRSC

Poster at the conferenceWritten as a short users guide

Evaluation Exploration

Incorporating cross-validation into relational data.

9 types of music recommendation Personalized versus generic Open query versus targeted query Dynamic versus static data New music versus all music

Personalized Radio

Open query with personalized presentation

Static data vs dynamic dataNew items prediction vs predict

anything

Targeted Search

Not personalizedSimilarity queriesAutomatically generating targeted lists

for a browsing hierarchyNew music vs all musicStatic vs dynamic data

Personalized Tag Radio

Create a personalized play list matching a given query

New music vs all musicStatic vs dynamic data

Excluded Types

‘Top 40’ predictionRendered obsolete by other types

Cross-Validation in Graphs

Actor removal Only form currently used All links to a particular actor are removed

Link removal Selected links from ground truth are

removed Algorithm evaluated on reproducing

missing links

Graph-RAT Improvements

Release of 0.4.4 Finalized Graph-RAT as a relational

programming language Added propositional algorithms

Release of 0.5.0 New Query Subsystem Usability enhancements Space complexity improvements

Aggregators

8 algorithms with 9 helper functionsCover each form of propositionalizationCover mappings between links and

propertiesCore primitives for Graph-RAT as a

programming language.

Similarity

2 new similarity algorithms1 new distance metric

Query Subsystem

28 primitives for searching in a graph 10 graph primitives 7 actor primitives 7 link primitives 4 property primitives

Functional - composition to build queries

Performance Specs

Queries can return collections or iterators.

Collections Implemented as references into graphs Linear in number of references

Iterators Ordered sequences of objects Constant in space complexity (excluding

Graph ID and AllGraphs)

Usability Enhancements

Properties and MetadataInterface enhancementsDynamic Loading of ClassesXML scripting support

Properties and Metadata

Properties description Encapsulates all parameter code Utilizes Graph-RAT Property objects Comparison to JavaBeans

New Metadata Model Parameter model update Input/Output descriptors update

Interface Updates

Arrays->Lists graph, link, actor, and property objects

Iterators All graph operations support iterators

Dynamic Loading

Classes loaded from file at runtime.Loading controlled by call to loader

objectAutomatic registering with relevant

factoriesAll factories updated to support dynamic

loading Extend Abstract Factory

XML Scripting support

SAX parser support for all components excepting crawling and parsing

Implemented using the Builder pattern

Core Improvements

2 cross-validation algorithms~20 algorithm with space complexity

improvementsIterators for all graph primitivesMacros for separation of graph data by

cross-validation property.

Additional algorithms

2 new similarity algorithms

1 new distance metric added

Obsolete algorithms removed

LastFM crawler updates

LastFM upgraded its web-services, removing the old version

New version will link to the semantic web

~20 parsers completedStill under construction

Planned Future Work

Contingent on arrival of computerTesting of existing codeCross-Validation SchedulerCompletion of LastFM ParserDBTunes (from semantic web) parserExperiments!Write Thesis!

Unplanned Future Work

Full semantic web crawlerIncorporating GData protocolsDatabase backendColt-Matrix-Over-Graph adapterDatabase-backed Weka instance

Beyond the Horizon

Support for Prolog primitivesMulti-database graph supportSemantic Web graph utilizing the proxy

patternSupport for dynamic updates and

dynamic data

top related