providing semantic and bibliographic data for library ... · semantic and bibliographic data for...
TRANSCRIPT
Providing Semantic and Bibliographic Data for
Library Discovery
Cathy Dolbear
Senior Link Architect, Data Strategy
Oxford University Press
Semantic and Bibliographic Data for Library Discovery / Cathy Dolbear 16th October 2015
Overview
2
• Introduction to OUP
• Our relationship with libraries as a publisher
• Industry trends for publication and delivery of metadata
• Semantic versus Bibliographic Metadata
• Where next?
Semantic and Bibliographic Data for Library Discovery / Cathy Dolbear 16th October 2015
Introduction to OUP
3
Meet the Press…
Semantic and Bibliographic Data for Library Discovery / Cathy Dolbear 16th October 2015
How library users find our content
4 Semantic and Bibliographic Data for Library Discovery / Cathy Dolbear 16th October 2015
search engines, specialised databases
Academic
Catalogue
(print +
online)
Journals
Online
Product
s Biblio-graphic
OUP
Other publishers
Semantic
(“authorities”)
Library
Management
System
User
Discovery
Services
Our relationship with libraries
5
• Customers
– but not users
– make purchasing decisions based on metadata-driven usage statistics
• Discovery portals
– Discovery services (ProQuest, ExLibris, OCLC etc) => XML metadata feeds
• But…
– Main referrers are search engines (Google/Scholar, Bing, Yahoo!) => GS
markup/RDFa/JSON-LD
– Users arrive via direct links (NIH PubMed,
escardio) => entity recognition
Semantic and Bibliographic Data for Library Discovery / Cathy Dolbear 16th October 2015
Customers; Discovery Portals
Discovery data
6
• Entry Referrers for a particular university consortium
Semantic and Bibliographic Data for Library Discovery / Cathy Dolbear 16th October 2015
Many ways to slice the pie
How should we provide our metadata?
7
Industry trends for publication and delivery
• “Push” - direct metadata deliveries
• “Pull” – metadata publishing
– Linked Data Publishing platform?
– .
– .U
Semantic and Bibliographic Data for Library Discovery / Cathy Dolbear 16th October 2015
OxMetaML
OAI-PMH:
MARC 21
Dublin Core
JSON-LD
• Publishers can’t just choose a single vocabulary/format
– Too many differing requirements/options
– Transform on delivery as required
• Simplify our internal metadata format
– Bibliographic & semantic information only
– Removing processing instructions
– Clear semantics makes linking/integration easier
– More likely we can publish our metadata without transformation
Decentralised web world: no single “standard” vocabulary
How should we provide our metadata?
8
Industry trends for publication and delivery
Semantic and Bibliographic Data for Library Discovery / Cathy Dolbear 16th October 2015
What information do we provide?
Bibliographic versus Semantic metadata
• Bibliographic information (author, title, ISBN etc)
• Semantic or contextual information - what the document is
about (academic subject, person, organisation etc)
Which vocabularies/ontologies?
10
so many standards to choose from…
• Semantic data integration is not straightforward
Semantic and Bibliographic Data for Library Discovery / Cathy Dolbear 16th October 2015
Why didn’t you use …?
11
[insert name of favourite vocabulary here]
• Semantic data integration is not straightforward
Semantic and Bibliographic Data for Library Discovery / Cathy Dolbear 16th October 2015
frbr:
Corporate
Body
frbr:
Person
frbr:Object
Why didn’t you use …?
12
[insert name of favourite vocabulary here]
• Semantic data integration is not straightforward
Semantic and Bibliographic Data for Library Discovery / Cathy Dolbear 16th October 2015
frbr:
Work
Metadata publishing
13
Embedded markup in HTML
• Google Scholar meta-tags
– HighWire Press/PRISM tagged bibliographic data
– Full text indexed (unlike Google)
• RDFa (RDF in attributes)
– Currently published on our non-journals online products
– Using schema.org vocabulary
Semantic and Bibliographic Data for Library Discovery / Cathy Dolbear 16th October 2015
─ RDFa distillers can scrape the
metadata
─ Only in HTML header => not fully
recognised by Google
Metadata publishing
14
JSON-LD
• Our RDFa not fully recognised by Google
– at the document, not object level
• Still want structured markup
– Improves click-through rate (30% reported by BestBuy)
– Search results more eye-catching as rich snippets
– Increases traffic (BBC reported 20%)
– Content indexed better
• Developing solution using JSON-LD
Semantic and Bibliographic Data for Library Discovery / Cathy Dolbear 16th October 2015
JSON-LD
Metadata publishing
15
Java Script Object Notation for Linked Data
Semantic and Bibliographic Data for Library Discovery / Cathy Dolbear 16th October 2015
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": “MedicalScholarlyArticle",
“publicationType: “V03.200” ,
“doi": "10.1093/brain/awl303”,
“author": {
“@type”: “Person”,
“affiliation”: “University of Wurzburg, Department of Neuroradiology, Wurzburg”,
“name”: “Andreas J Bartsch“
},
“keywords”: “alcoholism, morphometry, MR spectroscopy, SIENA, voxelwise SIENA statistics”,
“about": {
"@type": "MedicalCondition",
"name": “Alcoholism”,
"code": {
"@id": "http://www.ncbi.nlm.nih.gov/mesh/C25.775.100.250",
"@type": "MedicalCode",
"code": "C25.775.100.250",
"codingSystem": "MeSH"
},
}
}
</script>
Metadata publishing
16
Java Script Object Notation for Linked Data
Semantic and Bibliographic Data for Library Discovery / Cathy Dolbear 16th October 2015
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": “MedicalScholarlyArticle",
“publicationType: “V03.200” ,
“doi": "10.1093/brain/awl303”,
“author": {
“@type”: “Person”,
“affiliation”: “University of Wurzburg, Department of Neuroradiology, Wurzburg”,
“name”: “Andreas J Bartsch“
},
“keywords”: “alcoholism, morphometry, MR spectroscopy, SIENA, voxelwise SIENA statistics”,
“about": {
"@type": "MedicalCondition",
"name": “Alcoholism”,
"code": {
"@id": "http://www.ncbi.nlm.nih.gov/mesh/C25.775.100.250",
"@type": "MedicalCode",
"code": "C25.775.100.250",
"codingSystem": "MeSH"
},
}
}
</script>
"@context": "http://schema.org",
"@type": “MedicalScholarlyArticle",
Metadata publishing
17
Java Script Object Notation for Linked Data
Semantic and Bibliographic Data for Library Discovery / Cathy Dolbear 16th October 2015
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": “MedicalScholarlyArticle",
“publicationType: “V03.200” ,
“doi": "10.1093/brain/awl303”,
“author": {
“@type”: “Person”,
“affiliation”: “University of Wurzburg, Department of Neuroradiology, Wurzburg”,
“name”: “Andreas J Bartsch“
},
“keywords”: “alcoholism, morphometry, MR spectroscopy, SIENA, voxelwise SIENA statistics”,
“about": {
"@type": "MedicalCondition",
"name": “Alcoholism”,
"code": {
"@id": "http://www.ncbi.nlm.nih.gov/mesh/C25.775.100.250",
"@type": "MedicalCode",
"code": "C25.775.100.250",
"codingSystem": "MeSH"
},
}
}
</script>
“publicationType: “V03.200” ,
“doi": "10.1093/brain/awl303”,
Metadata publishing
18
Java Script Object Notation for Linked Data
Semantic and Bibliographic Data for Library Discovery / Cathy Dolbear 16th October 2015
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": “MedicalScholarlyArticle",
“publicationType: “V03.200” ,
“doi": "10.1093/brain/awl303”,
“author": {
“@type”: “Person”,
“affiliation”: “University of Wurzburg, Department of Neuroradiology, Wurzburg”,
“name”: “Andreas J Bartsch“
},
“keywords”: “alcoholism, morphometry, MR spectroscopy, SIENA, voxelwise SIENA statistics”,
“about": {
"@type": "MedicalCondition",
"name": “Alcoholism”,
"code": {
"@id": "http://www.ncbi.nlm.nih.gov/mesh/C25.775.100.250",
"@type": "MedicalCode",
"code": "C25.775.100.250",
"codingSystem": "MeSH"
},
}
}
</script>
“author": {
“@type”: “Person”,
“affiliation”: “University of Wurzburg”,
“name”: “Andreas J Bartsch“
},
Metadata publishing
19
Java Script Object Notation for Linked Data
Semantic and Bibliographic Data for Library Discovery / Cathy Dolbear 16th October 2015
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": “MedicalScholarlyArticle",
“publicationType: “V03.200” ,
“doi": "10.1093/brain/awl303”,
“author": {
“@type”: “Person”,
“affiliation”: “University of Wurzburg, Department of Neuroradiology, Wurzburg”,
“name”: “Andreas J Bartsch“
},
“keywords”: “alcoholism, morphometry, MR spectroscopy, SIENA, voxelwise SIENA statistics”,
“about": {
"@type": "MedicalCondition",
"name": “Alcoholism”,
"code": {
"@id": "http://www.ncbi.nlm.nih.gov/mesh/C25.775.100.250",
"@type": "MedicalCode",
"code": "C25.775.100.250",
"codingSystem": "MeSH"
},
}
}
</script>
“about": {
"@type": "MedicalCondition",
"name": “Alcoholism”,
"code": {…}
}
Entity recognition
20
• Wikidata have aligned Dictionary of National Biography people
to VIAF
• Semantic enrichment programme underway, starting with
medical entities – tagged with UMLS codes
Semantic and Bibliographic Data for Library Discovery / Cathy Dolbear 16th October 2015
in-line content markup
Where next?
21
Mainstream consumption of bibliographic linked data
• Publisher-supplied metadata
– Simple, clean semantic and bibliographic data model
– Output to multiple standards/formats in the interim
– Increase tagging of our content/ entity linking
– Providing semantic disambiguation
• Requirements mainly driven by web search engines so far
• If we publish linked data, will it be incorporated into library
search and indexing systems?
Semantic and Bibliographic Data for Library Discovery / Cathy Dolbear 16th October 2015
22
Any Questions?
Semantic and Bibliographic Data for Library Discovery / Cathy Dolbear 16th October 2015