publishing without publishers: a decentralized approach to dissemination, retrieval, and archiving...
TRANSCRIPT
Publishing without Publishers: aDecentralized Approach to Dissemination,
Retrieval, and Archiving of Data
Tobias Kuhn
http://www.tkuhn.org
@txkuhn
VU University Amsterdam
14th International Semantic Web Conference (ISWC 2015)Bethlehem, Pennsylvania, USA
15 October 2015
Increasing Importance of Scientific Data
https://www.google.com/trends/explore#q=%22data%20science%22
Tobias Kuhn, VU University Amsterdam Publishing without Publishers: a Decentralized Approach ... 2 / 20
Scientific Data as Supplemental Material
...
http://www.nature.com/ni/journal/v16/n10/full/ni.3267.html#supplementary-information
Tobias Kuhn, VU University Amsterdam Publishing without Publishers: a Decentralized Approach ... 3 / 20
Scientific Data in Open Repositories
Tobias Kuhn, VU University Amsterdam Publishing without Publishers: a Decentralized Approach ... 4 / 20
Problems with Current Data PublishingSolutions
• No enforcement of interoperable data formats such as RDF
• No enforcement of minimal provenance information
• A given dataset is often only available at one place, andtherefore inaccessible if that website happens to be down
• No guarantee that the data has not been modified/corrupted
• No possibility to access and identify individual data entries orsubsets of datasets
Tobias Kuhn, VU University Amsterdam Publishing without Publishers: a Decentralized Approach ... 5 / 20
(Scientific) Data Publishing
Published data should be:
• Verifiable (Is this really the data I am looking for?)
• Immutable (Can I be sure that it hasn’t been modified?)
• Permanent (Will it be available in 1, 5, 20 years from now?)
• Reliable (Can it be efficiently retrieved whenever needed?)
• Granular (Can I refer to individual data entries?)
• Semantic (Can it be automatically interpreted?)
• Linked (Does it use established identifiers and ontologies?)
• Trustworthy (Can I trust the source?)
Tobias Kuhn, VU University Amsterdam Publishing without Publishers: a Decentralized Approach ... 6 / 20
Requirement: Reliable Low-Level Operations
We need reliable low-level operations to publish data entries anddatasets:
publish <data-entry>
publish <dataset>
... and operations to retrieve data entries and datasets by theiridentifiers:
get <data-entry-id>
get <dataset-id>
(like HTTP POST/GET but verifiable, immutable, permanent, reliable, ...)
Approach: Linked Data + Cryptography + Decentralization
Tobias Kuhn, VU University Amsterdam Publishing without Publishers: a Decentralized Approach ... 7 / 20
Current Two-Layer Semantic Web Architecturesare not Fully Reliable
plain HTTP requestsand follow-your-nose:
applications (find/query/analyze/use data)
resolvable URIs (provide data)
SPARQL endpoints:applications (analyze/use data)
SPARQL endpoints (provide/find/query/analyze data)
Linked Data Fragments:applications (query/analyze/use data)
LDF servers (provide/find/query data)
1A single third-party server being down can break the entireapplication!
Tobias Kuhn, VU University Amsterdam Publishing without Publishers: a Decentralized Approach ... 8 / 20
Proposed Multi-Layer Architecture
applications (analyze/use data)
decentralized server network (provide data)
1
Tobias Kuhn, VU University Amsterdam Publishing without Publishers: a Decentralized Approach ... 9 / 20
Proposed Multi-Layer Architecture
applications (analyze/use data)
advanced services (query/analyze data)
core services (find data)
decentralized server network (provide data)
1
Tobias Kuhn, VU University Amsterdam Publishing without Publishers: a Decentralized Approach ... 10 / 20
Nanopublications: Linked Data Containers forProvenance-Aware Semantic Publishing
assertion
provenance
publication info
nanopublication
http://nanopub.org / @nanopub org
Tobias Kuhn, VU University Amsterdam Publishing without Publishers: a Decentralized Approach ... 11 / 20
Trusty URIs: Cryptographic Hash Values forVerifiable and Immutable Web Identifiers
Nanopublications with Trusty URIs are ...
XVerifiable
+
Immutable
+ �Permanent
.trighttp://example.org/r1. RA 5AbXdpz5DcaYXCh9l3eI9ruBosiL5XDU3rxBbBaUO70
http://trustyuri.net/
Kuhn, Dumontier. Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linked Data. ESWC 2014.
Tobias Kuhn, VU University Amsterdam Publishing without Publishers: a Decentralized Approach ... 12 / 20
Decentralized and Reliable Publishing with aNanopublication Server Network
Nanopublicationswith Trusty URIs
Publication
Retrieval
Propagation / Archiving
http://npmonitor.inn.ac
Tobias Kuhn, VU University Amsterdam Publishing without Publishers: a Decentralized Approach ... 13 / 20
Defining Datasets with Nanopublication Indexes(which are themselves Nanopublications)
appends
has sub-index
has element
(a) (b)
(c) (f)
(d) (e)
Tobias Kuhn, VU University Amsterdam Publishing without Publishers: a Decentralized Approach ... 14 / 20
Nanopublication Server Components and API
Nanopublication Server Components:
• Key-value store of nanopublications (trusty URI as the key)
• Journal: list of identifiers of all loaded nanopublications,subdivided into pages
• List of known peers: the URLs of other servers
Simple REST API (as GET/POST requests):
• Retrieve nanopublication in a format like TriG, TriX, or N-Quads(depending on content negotiation) for a given trusty URI
• Publish new nanopublication to the network
• Retrieve URLs of other servers in the network
• Retrieve journal page or gzipped package of it (for propagationof nanopublications to other servers)
Tobias Kuhn, VU University Amsterdam Publishing without Publishers: a Decentralized Approach ... 15 / 20
Nanopublication Server Network isEfficient and Scalable
Our servers can deliver nanopublications about 100 times faster thanwhen a triple store is used (and need much less resources):
time from start of test in seconds
resp
onse
tim
e in
sec
onds
0 50 100 150 200 250 3000 50 100 150 200 250 300
0.1
1
10
100
0 20 40 60 80 100
number of clients accessing the service in parallel
Virtuoso triple store with SPARQL endpointnanopublication server
Tobias Kuhn, VU University Amsterdam Publishing without Publishers: a Decentralized Approach ... 16 / 20
Nanopublication Datasets
Tobias Kuhn, VU University Amsterdam Publishing without Publishers: a Decentralized Approach ... 17 / 20
Reliable Low-Level Publish/Retrieve Operations!
Operation to publish data:
$ np publish nanopubs.trig
156026 nanopubs published at http://np.inn.ac/
which can also be used to publish dataset definitions (indexes):
$ np publish index.trig
157 nanopubs published at http://np.inn.ac/
Operation to retrieve data entries:
$ np get http://np.inn.ac/RA7Kmmugi8OuCirfe5WKchnJhC3FuhQDi6M4O8mgR0CqE
and to retrieve entire datasets:
$ np get -c http://np.inn.ac/RAY lQruuagCYtAcKAPptkY7EpITwZeUilGHsWGm9ZWNI
https://github.com/Nanopublication/nanopub-java
Tobias Kuhn, VU University Amsterdam Publishing without Publishers: a Decentralized Approach ... 18 / 20
Future Work
• Allow server to specify what kind of nanopublications they store
• Develop Core and Advanced Services on top of the servernetwork
• Establish best practices for versioning, retractions, reviews, etc.
• Connect it all to the scientific publishing workflow
Tobias Kuhn, VU University Amsterdam Publishing without Publishers: a Decentralized Approach ... 19 / 20
Thank you for your attention!
Questions?
Further information:
• Nanopublications: http://nanopub.org
• Trusty URIs: http://trustyuri.net
• Nanopublication Server Network: http://npmonitor.inn.ac
Tobias Kuhn, VU University Amsterdam Publishing without Publishers: a Decentralized Approach ... 20 / 20