publishing without publishers: a decentralized approach to dissemination, retrieval, and archiving...

20
Publishing without Publishers: a Decentralized Approach to Dissemination, Retrieval, and Archiving of Data Tobias Kuhn http://www.tkuhn.org @txkuhn VU University Amsterdam 14th International Semantic Web Conference (ISWC 2015) Bethlehem, Pennsylvania, USA 15 October 2015

Upload: tobias-kuhn

Post on 20-Jan-2017

1.280 views

Category:

Science


3 download

TRANSCRIPT

Page 1: Publishing without Publishers: a Decentralized Approach to Dissemination, Retrieval, and Archiving of Data

Publishing without Publishers: aDecentralized Approach to Dissemination,

Retrieval, and Archiving of Data

Tobias Kuhn

http://www.tkuhn.org

@txkuhn

VU University Amsterdam

14th International Semantic Web Conference (ISWC 2015)Bethlehem, Pennsylvania, USA

15 October 2015

Page 2: Publishing without Publishers: a Decentralized Approach to Dissemination, Retrieval, and Archiving of Data

Increasing Importance of Scientific Data

https://www.google.com/trends/explore#q=%22data%20science%22

Tobias Kuhn, VU University Amsterdam Publishing without Publishers: a Decentralized Approach ... 2 / 20

Page 3: Publishing without Publishers: a Decentralized Approach to Dissemination, Retrieval, and Archiving of Data

Scientific Data as Supplemental Material

...

http://www.nature.com/ni/journal/v16/n10/full/ni.3267.html#supplementary-information

Tobias Kuhn, VU University Amsterdam Publishing without Publishers: a Decentralized Approach ... 3 / 20

Page 4: Publishing without Publishers: a Decentralized Approach to Dissemination, Retrieval, and Archiving of Data

Scientific Data in Open Repositories

Tobias Kuhn, VU University Amsterdam Publishing without Publishers: a Decentralized Approach ... 4 / 20

Page 5: Publishing without Publishers: a Decentralized Approach to Dissemination, Retrieval, and Archiving of Data

Problems with Current Data PublishingSolutions

• No enforcement of interoperable data formats such as RDF

• No enforcement of minimal provenance information

• A given dataset is often only available at one place, andtherefore inaccessible if that website happens to be down

• No guarantee that the data has not been modified/corrupted

• No possibility to access and identify individual data entries orsubsets of datasets

Tobias Kuhn, VU University Amsterdam Publishing without Publishers: a Decentralized Approach ... 5 / 20

Page 6: Publishing without Publishers: a Decentralized Approach to Dissemination, Retrieval, and Archiving of Data

(Scientific) Data Publishing

Published data should be:

• Verifiable (Is this really the data I am looking for?)

• Immutable (Can I be sure that it hasn’t been modified?)

• Permanent (Will it be available in 1, 5, 20 years from now?)

• Reliable (Can it be efficiently retrieved whenever needed?)

• Granular (Can I refer to individual data entries?)

• Semantic (Can it be automatically interpreted?)

• Linked (Does it use established identifiers and ontologies?)

• Trustworthy (Can I trust the source?)

Tobias Kuhn, VU University Amsterdam Publishing without Publishers: a Decentralized Approach ... 6 / 20

Page 7: Publishing without Publishers: a Decentralized Approach to Dissemination, Retrieval, and Archiving of Data

Requirement: Reliable Low-Level Operations

We need reliable low-level operations to publish data entries anddatasets:

publish <data-entry>

publish <dataset>

... and operations to retrieve data entries and datasets by theiridentifiers:

get <data-entry-id>

get <dataset-id>

(like HTTP POST/GET but verifiable, immutable, permanent, reliable, ...)

Approach: Linked Data + Cryptography + Decentralization

Tobias Kuhn, VU University Amsterdam Publishing without Publishers: a Decentralized Approach ... 7 / 20

Page 8: Publishing without Publishers: a Decentralized Approach to Dissemination, Retrieval, and Archiving of Data

Current Two-Layer Semantic Web Architecturesare not Fully Reliable

plain HTTP requestsand follow-your-nose:

applications (find/query/analyze/use data)

resolvable URIs (provide data)

SPARQL endpoints:applications (analyze/use data)

SPARQL endpoints (provide/find/query/analyze data)

Linked Data Fragments:applications (query/analyze/use data)

LDF servers (provide/find/query data)

1A single third-party server being down can break the entireapplication!

Tobias Kuhn, VU University Amsterdam Publishing without Publishers: a Decentralized Approach ... 8 / 20

Page 9: Publishing without Publishers: a Decentralized Approach to Dissemination, Retrieval, and Archiving of Data

Proposed Multi-Layer Architecture

applications (analyze/use data)

decentralized server network (provide data)

1

Tobias Kuhn, VU University Amsterdam Publishing without Publishers: a Decentralized Approach ... 9 / 20

Page 10: Publishing without Publishers: a Decentralized Approach to Dissemination, Retrieval, and Archiving of Data

Proposed Multi-Layer Architecture

applications (analyze/use data)

advanced services (query/analyze data)

core services (find data)

decentralized server network (provide data)

1

Tobias Kuhn, VU University Amsterdam Publishing without Publishers: a Decentralized Approach ... 10 / 20

Page 11: Publishing without Publishers: a Decentralized Approach to Dissemination, Retrieval, and Archiving of Data

Nanopublications: Linked Data Containers forProvenance-Aware Semantic Publishing

assertion

provenance

publication info

nanopublication

http://nanopub.org / @nanopub org

Tobias Kuhn, VU University Amsterdam Publishing without Publishers: a Decentralized Approach ... 11 / 20

Page 12: Publishing without Publishers: a Decentralized Approach to Dissemination, Retrieval, and Archiving of Data

Trusty URIs: Cryptographic Hash Values forVerifiable and Immutable Web Identifiers

Nanopublications with Trusty URIs are ...

XVerifiable

+

Immutable

+ �Permanent

.trighttp://example.org/r1. RA 5AbXdpz5DcaYXCh9l3eI9ruBosiL5XDU3rxBbBaUO70

http://trustyuri.net/

Kuhn, Dumontier. Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linked Data. ESWC 2014.

Tobias Kuhn, VU University Amsterdam Publishing without Publishers: a Decentralized Approach ... 12 / 20

Page 13: Publishing without Publishers: a Decentralized Approach to Dissemination, Retrieval, and Archiving of Data

Decentralized and Reliable Publishing with aNanopublication Server Network

Nanopublicationswith Trusty URIs

Publication

Retrieval

Propagation / Archiving

http://npmonitor.inn.ac

Tobias Kuhn, VU University Amsterdam Publishing without Publishers: a Decentralized Approach ... 13 / 20

Page 14: Publishing without Publishers: a Decentralized Approach to Dissemination, Retrieval, and Archiving of Data

Defining Datasets with Nanopublication Indexes(which are themselves Nanopublications)

appends

has sub-index

has element

(a) (b)

(c) (f)

(d) (e)

Tobias Kuhn, VU University Amsterdam Publishing without Publishers: a Decentralized Approach ... 14 / 20

Page 15: Publishing without Publishers: a Decentralized Approach to Dissemination, Retrieval, and Archiving of Data

Nanopublication Server Components and API

Nanopublication Server Components:

• Key-value store of nanopublications (trusty URI as the key)

• Journal: list of identifiers of all loaded nanopublications,subdivided into pages

• List of known peers: the URLs of other servers

Simple REST API (as GET/POST requests):

• Retrieve nanopublication in a format like TriG, TriX, or N-Quads(depending on content negotiation) for a given trusty URI

• Publish new nanopublication to the network

• Retrieve URLs of other servers in the network

• Retrieve journal page or gzipped package of it (for propagationof nanopublications to other servers)

Tobias Kuhn, VU University Amsterdam Publishing without Publishers: a Decentralized Approach ... 15 / 20

Page 16: Publishing without Publishers: a Decentralized Approach to Dissemination, Retrieval, and Archiving of Data

Nanopublication Server Network isEfficient and Scalable

Our servers can deliver nanopublications about 100 times faster thanwhen a triple store is used (and need much less resources):

time from start of test in seconds

resp

onse

tim

e in

sec

onds

0 50 100 150 200 250 3000 50 100 150 200 250 300

0.1

1

10

100

0 20 40 60 80 100

number of clients accessing the service in parallel

Virtuoso triple store with SPARQL endpointnanopublication server

Tobias Kuhn, VU University Amsterdam Publishing without Publishers: a Decentralized Approach ... 16 / 20

Page 17: Publishing without Publishers: a Decentralized Approach to Dissemination, Retrieval, and Archiving of Data

Nanopublication Datasets

Tobias Kuhn, VU University Amsterdam Publishing without Publishers: a Decentralized Approach ... 17 / 20

Page 18: Publishing without Publishers: a Decentralized Approach to Dissemination, Retrieval, and Archiving of Data

Reliable Low-Level Publish/Retrieve Operations!

Operation to publish data:

$ np publish nanopubs.trig

156026 nanopubs published at http://np.inn.ac/

which can also be used to publish dataset definitions (indexes):

$ np publish index.trig

157 nanopubs published at http://np.inn.ac/

Operation to retrieve data entries:

$ np get http://np.inn.ac/RA7Kmmugi8OuCirfe5WKchnJhC3FuhQDi6M4O8mgR0CqE

and to retrieve entire datasets:

$ np get -c http://np.inn.ac/RAY lQruuagCYtAcKAPptkY7EpITwZeUilGHsWGm9ZWNI

https://github.com/Nanopublication/nanopub-java

Tobias Kuhn, VU University Amsterdam Publishing without Publishers: a Decentralized Approach ... 18 / 20

Page 19: Publishing without Publishers: a Decentralized Approach to Dissemination, Retrieval, and Archiving of Data

Future Work

• Allow server to specify what kind of nanopublications they store

• Develop Core and Advanced Services on top of the servernetwork

• Establish best practices for versioning, retractions, reviews, etc.

• Connect it all to the scientific publishing workflow

Tobias Kuhn, VU University Amsterdam Publishing without Publishers: a Decentralized Approach ... 19 / 20

Page 20: Publishing without Publishers: a Decentralized Approach to Dissemination, Retrieval, and Archiving of Data

Thank you for your attention!

Questions?

Further information:

• Nanopublications: http://nanopub.org

• Trusty URIs: http://trustyuri.net

• Nanopublication Server Network: http://npmonitor.inn.ac

Tobias Kuhn, VU University Amsterdam Publishing without Publishers: a Decentralized Approach ... 20 / 20