scripting user contributed interlinking
DESCRIPTION
Presentation about User Contributed Interlinking at Scripting for the Semantic Web (SFSW) 2008 workshop at European Semantic Web Conference (ESWC) 2008TRANSCRIPT
Institute of Information Systems & Information Management
Scripting User Contributed Interlinking
Michael Hausenblas, Wolfgang Halb, and Yves RaimondSFSW08, Tenerife, Spain
2008-06-02
2
Agenda Linked Data 101 A first step in UCI – http://riese.joanneum.at Towards Generalising UCI Demo
3
Linked Data: Principles Items should be identified using URI references [
URIrefs] (and: don’t use bNodes) URIrefs should be dereferenceable: using HTTP
URIs allows looking up the items identified through URIrefs, cf. [http-range-14 TAG finding]
Looking up an URIref leads to more data [follow-your-nose principle]
Links to other URIrefs should be included in order to enable the discovery of more data [How to Publish Linked Data on the Web]
4
Linked Data: Datasets (2008)
By courtesy of Richard Cyganiak, http://richard.cyganiak.de/2007/10/lod/
5
Linked Data: Issues Building
RDFising process (schema, mapping) Interlinking (automagically, manual) Deployment (SPARQL end point, dump, RDFa, etc.)
Using Provenance, trust, rights, etc. Access (depending on deployment) Performance (deref chain, reliability) Discovery (which is the right LOD dataset for my task ?)
6
A first step in UCI - riese
http://riese.joanneum.at
7
riese: A first step in UCI riese, the ‘RDFizing and Interlinking the EuroStat
Dataset Effort’ aims to offer an RDFised and interlinked version of the Eurostat data (http://ec.europa.eu/eurostat)
Eurostat data is high-volume data (5 GB data dump in approx. 4,000 TSV files; 350 million data values 80,000 different data codes)
Currently we serve 3.6 million triples, interlinking with Geonames (DBpedia and Wordnet upcoming)
Data is exposed as XHTML+RDFa, SPARQL end-point and as dump (+semantic sitemap description)
8
riese: architecture
9
riese: inside Server
Apache 2.2 SWI-Prolog PHP 5 p2r/Ceriese (see Yves’s blog post) (RDF/XML documents in the file system)
Client XHTML+RDFa Javascript/Yahoo! Interface Library [YUI]
Vocabulary (triggered the development of scovo, the Statistical Core Vocabulary together with Talis and Lee Feigenbaum, see http://purl.org/NET/scovo)
10
riese: User Contributed Interlinking
11
riese: User Contributed Interlinking
12
riese: issues Dynamic content (Ajax) vs. embedded metadata
(RDFa). Local agent has the data in the DOM, but external agent can not access it. No real solution, yet.
Scalability & Performance. When data is fine-granular and high-volume, how much to embed directly in a page?
How to notify users about data updates? We currently experiment with AtomOwl deployed in RDFa (http://riese.joanneum.at/updates/)
13
Towards Generalising UCI Next step after riese was to decouple the UCI and
generalise it. The result is: I R S (interlinking of resources with semantics, see also poster session)
I R S features query, add, remove semantic links (owl:sameAs, rdfs:seeAlso,
foaf:topic, etc.) subject and object can be set by user (restriction: URIs only) resource preview (debug) expose data in XHTML+RDFa + SPARQL end point lookup in http://sindice.com for unknown resources simple provenance tracking through named graphs
14
Towards Generalising UCI: I R S
15
Towards Generalising UCI: I R S
16
I R S issues
Motivation for end-user to contribute has yet to be researched
Trust issues arise (experimenting with OpenID) Generic UCI requires high level of abstraction
(maybe only for geeks and not suitable for an end-user)
To get an overview of what is available some other mechanism should be offered (currently only SPARQL end point)
Validation of resources is desirable (e.g. type of target, information vs. non-information resource, etc.)
17
Discussion UCI can help creating high-quality semantic links Social process needs to be researched (might turn
out that it is pretty similar to the Wiki ecosystem) Some type of content such as multimedia content
might benefit more from UCI than others Is generic UCI only for geeks? To really be
successful, the UCI likely needs to be embedded into a domain-specific application
BTW, I R S is also a nice LOD debugger ;) Questions?