pid signposting pattern

51
Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016 Cartoon by Patrick Hochstenbach Herbert Van de Sompel LANL & DANS @hvdsomp http://orcid.org/0000-0002- 0715-6126 Acknowledgments: Geoff Bilder, Shawn Jones, Martin Klein, Michael L. Nelson, David Rosenthal, Harihar Shankar, Simeon Warner, Karl Ward, Joe Wass A Signposting Pattern for PIDs http://signposting.org Signposting is funded by the Andrew W. Mellon Foundation

Upload: herbert-van-de-sompel

Post on 14-Jan-2017

729 views

Category:

Internet


0 download

TRANSCRIPT

Page 1: PID Signposting Pattern

Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016

Cartoon by Patrick Hochstenbach

Herbert Van de SompelLANL & DANS

@hvdsomphttp://orcid.org/0000-0002-0715-6126

Acknowledgments: Geoff Bilder, Shawn Jones, Martin Klein, Michael L. Nelson, David Rosenthal, Harihar Shankar, Simeon Warner,

Karl Ward, Joe Wass

A Signposting Pattern for PIDshttp://signposting.org

Signposting is funded by the Andrew W. Mellon Foundation

Page 2: PID Signposting Pattern

Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016

• A Disconcerting Observation

• A Proposed Fix Using Signposting

• Signposting, The Bigger Picture

• Additional Signposting Patterns

Outline

Page 3: PID Signposting Pattern

Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016

Large Scale Study into Reference Rot for Links to Web-at-Large Resources Found in STM Articles

Martin Klein, Herbert Van de Sompel, et al. (2014) Scholarly context not found. In: PLOS ONEhttps://doi.org/10.1371/journal.pone.0115253

Shawn Jones, Herbert Van de Sompel, et al. (2016) Scholarly context adrift. Under review.

Page 4: PID Signposting Pattern

Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016

STM Articles in the Study

Martin Klein, Herbert Van de Sompel, et al. (2014) Scholarly context not found. In: PLOS ONEhttps://doi.org/10.1371/journal.pone.0115253

STM articles published 1997-2012 arXiv PMC totalPer corpus 707,667 479,194 1,186,861

With URI references to articles 51,574 240,857 292,431

With URI references to web-at-large resources 142,134 156,160 298,294

Page 5: PID Signposting Pattern

Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016

Articles that Link to Articles & to Web At Large Resources (PMC)

Martin Klein, Herbert Van de Sompel, et al. (2014) Scholarly context not found. In: PLOS ONEhttps://doi.org/10.1371/journal.pone.0115253

Page 6: PID Signposting Pattern

Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016

URI References in the Study

Martin Klein, Herbert Van de Sompel, et al. (2014) Scholarly context not found. In: PLOS ONEhttps://doi.org/10.1371/journal.pone.0115253

URI References arXiv PMC totalPer corpus 781,895 1,653,567 2,435,462

Excluded 1,555 428,036 429,591

To articles 434,163 744,678 1,178,841

To web-at-large resources 346,177 480,853 827,030

Page 7: PID Signposting Pattern

Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016

URI References to Articles & to Web At Large Resources (PMC)

Martin Klein, Herbert Van de Sompel, et al. (2014) Scholarly context not found. In: PLOS ONEhttps://doi.org/10.1371/journal.pone.0115253

Page 8: PID Signposting Pattern

Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016

• When classifying URI references as linking to articles, we assumed that filtering on http://dx.doi.org/* would do the trick

• But we found a lot of e.g. http://link.springer.com/article/*

• For example:• http://link.springer.com/article/10.1007%2Fs00799-014-018-0

• Instead of:• http://dx.doi.org/10.1007/s00799-014-0108-0

• We used CrossRef’s Reverse Domain Lookup to classify these URI references as linking to articles and went on with our reference rot research

A Disconcerting Observation

Page 9: PID Signposting Pattern

Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016

Hiberlink Results: Link Rot - arXiv

Martin Klein, Herbert Van de Sompel, et al. (2014) Scholarly context not found. In: PLOS ONEhttps://doi.org/10.1371/journal.pone.0115253

Page 10: PID Signposting Pattern

Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016

Hiberlink Results: Content Drift - arXiv

Shawn Jones, Herbert Van de Sompel, et al. (2016) Scholarly context adrift. Under review.

Under review

Page 11: PID Signposting Pattern

Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016

Hiberlink Results: Robust Links

http://robustlinks.mementoweb.org

Page 12: PID Signposting Pattern

Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016

A Closer Look at the Disconcerting Observation

Herbert Van de Sompel, Martin Klein, and Shawn Jones (2016) Persistent URIs Must Be Used to Be Persistent. In: WWW2016. http://arxiv.org/1602.09102

Page 13: PID Signposting Pattern

Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016

A Closer Look at the Disconcerting Observation - arXiv

Herbert Van de Sompel, Martin Klein, and Shawn Jones (2016) Persistent URIs Must Be Used to Be Persistent. In: WWW2016. http://arxiv.org/1602.09102

Page 14: PID Signposting Pattern

Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016

A Closer Look at the Disconcerting Observation - PMC

Herbert Van de Sompel, Martin Klein, and Shawn Jones (2016) Persistent URIs Must Be Used to Be Persistent. In: WWW2016. http://arxiv.org/1602.09102

Page 15: PID Signposting Pattern

Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016

• CrossRef’s publisher baseURLs represents the state of the DOI resolver at the time of the research• Some shouldbeDOI may have been classified as web-at-large

because old publisher baseURLs are no longer in the resolver

• At the time of the research, no public information was available about when a publisher started to assign DOIs• Some references may have wrongly been classified as

shouldbeDOI because publisher was not yet assigning DOIs in earlier years

• Findings for recent years do not suffer from the above

Caveats Regarding the Disconcerting Observation

Page 16: PID Signposting Pattern

Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016

Content Types of ”200 OK” shouldbeDOI Resources, Year 2012

Content Type arXiv PMCtext/html 19,649 63,769

application/pdf 153 1,813

text/plain 7 3,924

image/jpeg 1 64

other 46 74

none provided 2,118 5,210

total 21,974 74,854

Page 17: PID Signposting Pattern

Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016

Content Length of ”200 OK” shouldbeDOI Resources, Year 2012

Content Length arXiv PMC1-50 k 6,084 7,215

50-100 k 772 12,804

100-150 k 225 4,835

150-200 k 33 7,885

200+ k 216 9,423

chunked 4,100 20,596

none provided 10,544 12,096

total 21,974 74,854

Page 18: PID Signposting Pattern

Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016

Top Target baseURLs for shouldbeDOI Resources, 1997-2012

arXiv PMCams.org biomedcentral.com

adsabs.harvard.edu scripts.iucr.org

link.aps.org ncbi.nlm.nih.gov

stacks.aip.org frontiersin.org

link.aip.org ccforum.com

emis.de nar.oxfordjournals.org

springerlink.com nature.com

jstor.org elsevier.com

ncbi.nlm.nih.gov jcb.org

sciencemag.org jmir.org

Page 19: PID Signposting Pattern

Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016

• A Disconcerting Observation

• A Proposed Fix Using Signposting

• Signposting, The Bigger Picture

• Additional Signposting Patterns

Outline

Page 20: PID Signposting Pattern

Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016

• The PID URI is not in the browser’s address bar, when at:• The landing page• The PDF• The dataset• Any web resource that is part of the PID-identified object

• Status quo:• Provide the PID URI in copy/paste-able manner in landing page• Provide PID URI in a downloadable citation• Embed PID URI in an XMP container

• Desired: The ability for tools to uniformely discover the PID URI when at any web resource that is part of a PID-identified object

Status Quo

Page 21: PID Signposting Pattern

Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016

HTTP Links

Mark Nottingham (2010) RFC5988: Web Linking. http://tools.iets.org/rfc/rfc5988.txt

Page 22: PID Signposting Pattern

Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016

HTTP Links

Page 23: PID Signposting Pattern

Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016

HTTP Links

Page 24: PID Signposting Pattern

Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016

HTTP Links Are Used

curl –I http://dbpedia.org/data/Reykjavik

HTTP/1.1 200 OKDate: Thu, 27 Oct 2016 04:43:28 GMTContent-Type: application/rdf+xml; charset=UTF-8Content-Length: 1210Link: <http://creativecommons.org/licenses/by-sa/3.0> ; rel=“license", <http://dbpedia.org/data/Reykjavik> ; rel="alternate"; type="text/n3", <http://dbpedia.org/resource/Reykjavik>; rel="describes", <http://mementoarchive.lanl.gov/dbpedia/timegate/http://dbpedia.org/ data/Reykjavik> ; rel="timegate"

Page 25: PID Signposting Pattern

Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016

HTTP Links Are Used

curl –I http://dbpedia.org/data/Reykjavik

HTTP/1.1 200 OKDate: Thu, 27 Oct 2016 04:43:28 GMTContent-Type: application/rdf+xml; charset=UTF-8Content-Length: 1210Link: <http://creativecommons.org/licenses/by-sa/3.0> ; rel=“license", <http://dbpedia.org/data/Reykjavik> ; rel="alternate"; type="text/n3", <http://dbpedia.org/resource/Reykjavik>; rel="describes", <http://mementoarchive.lanl.gov/dbpedia/timegate/http://dbpedia.org/ data/Reykjavik> ; rel="timegate"

Page 26: PID Signposting Pattern

Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016

HTTP Links Are Used

curl –I http://dbpedia.org/data/Reykjavik

HTTP/1.1 200 OKDate: Thu, 27 Oct 2016 04:43:28 GMTContent-Type: application/rdf+xml; charset=UTF-8Content-Length: 1210Link: <http://creativecommons.org/licenses/by-sa/3.0> ; rel=“license", <http://dbpedia.org/data/Reykjavik> ; rel="alternate"; type="text/n3", <http://dbpedia.org/resource/Reykjavik>; rel="describes", <http://mementoarchive.lanl.gov/dbpedia/timegate/http://dbpedia.org/ data/Reykjavik> ; rel="timegate"

Page 27: PID Signposting Pattern

Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016

HTTP Links Are Used

curl –I http://dbpedia.org/data/Reykjavik

HTTP/1.1 200 OKDate: Thu, 27 Oct 2016 04:43:28 GMTContent-Type: application/rdf+xml; charset=UTF-8Content-Length: 1210Link: <http://creativecommons.org/licenses/by-sa/3.0> ; rel=“license", <http://dbpedia.org/data/Reykjavik> ; rel="alternate"; type="text/n3", <http://dbpedia.org/resource/Reykjavik>; rel="describes", <http://mementoarchive.lanl.gov/dbpedia/timegate/http://dbpedia.org/ data/Reykjavik> ; rel="timegate"

Page 28: PID Signposting Pattern

Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016

HTTP Link Are Used

curl –I http://dbpedia.org/data/Reykjavik

HTTP/1.1 200 OKDate: Thu, 27 Oct 2016 04:43:28 GMTContent-Type: application/rdf+xml; charset=UTF-8Content-Length: 1210Link: <http://creativecommons.org/licenses/by-sa/3.0> ; rel=“license", <http://dbpedia.org/data/Reykjavik> ; rel="alternate"; type="text/n3", <http://dbpedia.org/resource/Reykjavik>; rel="describes", <http://mementoarchive.lanl.gov/dbpedia/timegate/http://dbpedia.org/ data/Reykjavik> ; rel="timegate"

Page 29: PID Signposting Pattern

Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016

• Registered in IANA registry• Strings, e.g. license, alternate, describes, timegate• Requires a formal specification, e.g. RFC• Typically used for common relationships, generically specified• Provides broad, coarse grained interoperability

• Minted by a community• URIs, e.g. http://xmlns.com/foaf/0.1/primaryTopic• Requires community agreement• Can be as specific as desired• Can provide community-specific, fine grained interoperability

HTTP Link Relation Types

Page 30: PID Signposting Pattern

Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016

Proposal: Use HTTP Link with identifier Relation Type

curl –I http://www.dlib.org/dlib/november15/vandesompel/11vandesompel.html

HTTP/1.1 200 OKDate: Wed, 26 Oct 2016 12:36:37 GMTServer: Apache/2.2.15 (CentOS)Last-Modified: Thu, 19 Nov 2015 14:50:19 GMTETag: "205a5e-f5ef-524e5e0ab80c0"Accept-Ranges: bytesContent-Length: 62959Content-Type: text/html; charset=UTF-8Link: <https://doi.org/10.1045/november2015-vandesompel> ; rel=“identifier”

Michael Nelson and Herbert Van de Sompel (2016) Linking to Persistent Identifiers with rel=“identifier”http://ws-dl.blogspot.nl/2016/11/2016-11-07-linking-to-persistent.html

Page 31: PID Signposting Pattern

Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016

Proposal: Use HTTP Link with identifier Relation Type

http://signposting.org/identifier/dryad/

Page 32: PID Signposting Pattern

Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016

• Can uniformly be used for all MIME types

• Accessible via HTTP HEAD (no content transfer):• Works for large resources• Can work for restricted content• Unbelievable but True: Many publishers don’t support HEAD

• In many cases, HTTP identifier links can be implemented using simple URI rewrite rules in web server• The URIs of web resources that are part of PID-identified object

many times contain the PID

• Allows addressing many other patterns using basic technology

HTTP Links Are Pretty Neat

Page 33: PID Signposting Pattern

Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016

• A Disconcerting Observation

• A Proposed Fix Using Signposting

• Signposting, The Bigger Picture

• Additional Signposting Patterns

Outline

Page 34: PID Signposting Pattern

Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016

Signposting the Scholarly Web

http://signposting.org

Page 35: PID Signposting Pattern

Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016

Herbert Van de Sompel and Michael L. Nelson (2015) Reminiscing about 15 years of interoperability efforts. https://doi.org/10.1045/november2015-vandesompel

Reminiscing About Interoperability for Scholarly Communication

Page 36: PID Signposting Pattern

Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016

I Have Done My Fair Share

OAI-PMH

OAI-ORE

Memento

Shared Canvas

info URI

Open Annotation

ResourceSync

Robust Links

OpenURL

Page 37: PID Signposting Pattern

Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016

• A highly distributed activity

• Try turning this distributed activity from a gathering of silos into an ecology of collaborating repositories• In the web context, this seems like a rather unique challenge• Most web enterprises want dominance, not collaboration

• Interoperability as an enabler to connect resources from distributed repositories• Repositories expose uniform behaviors• Multiple parties can interact uniformly with (resources of) these

repositories to create added-value

Research Communication on the Web

Page 38: PID Signposting Pattern

Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016

Tools of the Web-Centric Interoperability Trade

• Resource• URI• HTTP as the API: HEAD/GET, POST, PUT, DELETE• Representation• Media Type• Link• Content Negotiation, e.g. for preferred Media Type

• Typed Link• Controlled Vocabularies for Typed Links

W3C Architecture of the World Wide

Web

Page 39: PID Signposting Pattern

Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016

Tools of the Web-Centric Interoperability Trade – RDF Stack

• Resource• URI• HTTP as the API: HEAD/GET, POST, PUT, DELETE• Representation• Media Type• Link• Content Negotiation, e.g. for preferred Media Type

• Typed Link• Controlled Vocabularies for Typed Links

RDF, RDFS, OWL

W3C Architecture of the World Wide

Web

Page 40: PID Signposting Pattern

Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016

Used by various interoperability efforts, e.g. OAI-ORE, Open Annotation, W3C PROV, Research Objects, …

• Provides extensive expressiveness for description• Typically based on publishing documents that adhere to a certain

“profile” and reveal relations, properties, …• Non-Trivial barrier to entry as illustrated by slow adoption, likely

related to unfamiliar technology stack

Interoperability via RDF, RDFS, OWL Stack

Page 41: PID Signposting Pattern

Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016

Tools of the Web-Centric Interoperability Trade – HTTP Stack

• Resource• URI• HTTP as the API• Representation• Media Types• Link• Content Negotiation, e.g. for preferred Media Type

• Typed Link• Controlled Vocabularies for Typed Links

HTTP Links, IANA link

relation registry, community link relation types

HATEOAS – Hypermedia As The Engine Of Application State

http://en.wikipedia.org/wiki/HATEOAS

W3C Architecture of the World Wide

Web

Page 42: PID Signposting Pattern

Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016

Used by Memento, ResourceSync, Signposting the Scholarly Web:

• Provides coarse expressiveness for navigation via IANA registered relation types (expressed as reserved terms)

• Finer grained expressiveness via community-defined relation types (expressed as HTTP URIs)

• Typically based on publishing typed links that support a client to navigate among resources in an informed manner

• Low implementation barrier because of familiar technology stack

Interoperability via HTTP Links, IANA Link Relation Types

Page 43: PID Signposting Pattern

Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016

• A Disconcerting Observation

• A Proposed Fix Using Signposting

• Signposting, The Bigger Picture

• Additional Signposting Patterns

Outline

Page 44: PID Signposting Pattern

Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016

• Identifier pattern

• Publication boundary pattern

• Bibliographic metadata pattern

Currently at signposting.org

Page 45: PID Signposting Pattern

Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016

Publication Boundary Pattern

http://signposting.org/publication_boundary/oxford/

Page 46: PID Signposting Pattern

Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016

Bibliographic Metadata Pattern

http://signposting.org/bibliographic_metadata/springer/

Page 47: PID Signposting Pattern

Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016

Bibliographic Metadata Pattern

http://signposting.org/conventions/

Page 48: PID Signposting Pattern

Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016

Use Case: Resource Capture for Digital Preservation

Herbert Van de Sompel, David Rosenthal, and Michael L. Nelson (2015) Web Infrastructure to Support e-Journal Preservation (and More). http://arxiv.org/abs/1605.06154

Page 49: PID Signposting Pattern

Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016

• Author pattern• author link from DOI URI to ORCID URI• author link from landing page to ORCID URI

• License pattern• license link from web resources that are part of a scholarly

object to the appropriate license URI

• Resource type pattern• type relation type on the web resource itself • sem-type attribute on links to a web resource• URIs to express resource types

• Which? How coarse/fine grained?

Expected at signposting.org

Page 50: PID Signposting Pattern

Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016

Resource Type Pattern

Page 51: PID Signposting Pattern

Herbert Van de Sompel PIDapalooza, Reykjavik, Iceland, 10 Nov 2016

Cartoon by: Patrick Hochstenbach

Herbert Van de SompelLANL & DANS

@hvdsomphttp://orcid.org/0000-0002-0715-6126

Acknowledgments: Geoff Bilder, Shawn Jones, Martin Klein, Michael L. Nelson, David Rosenthal, Harihar Shankar, Simeon Warner,

Karl Ward, Joe Wass

A Signposting Pattern for PIDshttp://signposting.org

Signposting is funded by the Andrew W. Mellon Foundation