lodlam landscape notes
TRANSCRIPT
NISO Virtual Conference: BIBFRAME & Real World Applications of Linked Bibliographic Datahttp://www.niso.org/news/events/2016/virtual_conference/jun15_virtualconf/June 15, 2016
Keynote: Landscape and Current Status of BIBFRAME and Related Initiatives“The LODLAM Landscape”
SLIDE 1: The LODLAM LandscapeSubtitle: BIBFRAME & Other Linked Data Initiatives
SLIDE 2: Why LODLAM?First, a definition:LODLAM: linked open data in libraries, archives, and museumsLibraries – Libraries, Archives, Museums (e.g. cultural heritage institutions)Why is Linked Data important?
Libraries – trusted repositories of information; the data we have is rich and deep
Linked data – how we can share that information on the web; making things more discoverable and accessible
So what’s the problem?! We’ve got this data, so why isn’t it just thrown out there?
Our data is siloed in MARCMARC
we’ve been making it do something it wasn’t designed to do for 20+ years now designed to purely transmit data for printing; NOT for being indexed, searched, manipulated, etc. MARC is contextual (hello, ISBD) – machines don’t do context, they are dumb and literal Henriette Avram – one of my heroes (sheroes) – BUT not a storage/retrieval format; for transmitting
and display only
BIBFRAME is one effort to break that silo.
SLIDE 3: Terms and DefinitionsSemantic web: “W3C, "The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries“”
Tim Berners-Lee – defined the semantic web/linked data
Semantic web is powered by Linked Data
Linked Data is structured data that is machine readable/actionable
SLIDE 4: [CATS]Of course, we all know the internet (aka the web) is actually made of a series of tubes filled with CATS, not data.
SourcesSubversive Cross StitchPusheen.comNyan catThreadless – “Voltron” of cats
SLIDE 5: [xkcd]Well, OK. It’s not really cats in tubes. The web is actually a bunch of standards – that keep proliferating – but must be interoperablehttp://xkcd.com/927/Alt text: charging issue solved. Wait, is it: mini-USB? Micro-USB?
SLIDE 6: (Semantic Web)Semantic web – web of linked data
powered by rules/standards and technologies that provide structure to data
*Various (most commonly used/heard) standards and organizations that power the Semantic Web; most used as part of or by linked data
W3C: World Wide Web Consortium URI: Uniform resource identifier (a URL is one type of URI, also have URNs) HTTP: hypertext transfer protocol – allows a URI to be actionable (“linkable” or “clickable”) XML/HTML: mark-up languages Microdata: nest metadata within existing web page content (embedded in HTML) JSON: JaveScript Object Notation – data format for transmitting data objects (attribute-value pairs);
JSON-LD – method of encoding linked data using JSON RDF: Resource Description Framework – conceptual description or modeling of information used in
web resources; multiple RDF specifications (entity-relationship); most other semantic web standards are built on or use RDF
Turtle: Terse RDF Triple Language SPARQL: semantic query language (uses RDF); endpoint for access SKOS: simple knowledge organization system (for controlled vocabulary representation) OWL: web ontology language (knowledge representation language for ontologies
(vocabularies/taxonomies)) FOAF: “friend of a friend” (ontology for describing persons) Schema.org: Bing/Google/Yahoo - schemas for structured data markup on web pages
SLIDE 7: (Semantic) Web: LODLAM SubsetOther common terms and organizations within the LODLAM community:
BIBFRAME: data model for bibliographic description using linked data principles Blacklight: discovery interface for Solr index;
o http://projectblacklight.org/ Solr: open source standalone enterprise search server; powerful indexing
o http://lucene.apache.org/solr/ Fedora: Flexible Extensible Digital Object Repository Architecture; open source architecture for
storing, managing, and accessing digital content in the form of digital objects; o http://fedora-commons.org/
Hydra: repository solution; “Hydra is an ecosystem of components that lets institutions deploy robust and durable digital repositories (the body) supporting multiple “heads”: fully-featured digital asset management applications and tailored workflows. Its principle platforms are the Fedora Commons repository software, Solr, Ruby on Rails and Blacklight”;
o http://projecthydra.org/ ORCID: open, non-profit, community-driven effort to create and maintain a registry of unique
researcher identifiers; o http://orcid.org/
VIVO: member-supported, open source software and an ontology for representing scholarship; o http://vivoweb.org/
PLUS existing controlled vocabularies such as LCSH, LCClassification, Dewey, etc.
SLIDE 8: Linked DataPowers the semantic webSTRUCTURED data
can be queried, linked to/from, and integrated builds on existing web technologies machine (not human) friendly – actionable by machine uses controlled vocabularies structured data stored in various interoperable “containers” (mark-up, schemas, etc.)
Structure of linked data: triples:
a subject, a predicate, and an object – defines relationship between two things (entity-relationship model)
Best case: each element of a triple is a URI, then each element can connect with other elements
URI – dereferencable (actionable/clickable)
Controlled vocabularies and identities are “things” represented by URIs – multiple sources will “connect through” one URI, providing the “web of data”
Things: represented by URIs; they are actionable and connect to other things/strings Strings: text; also called literals; “dead ends” in the semantic web/linked data
SLIDE 9: Triples and QuadsTRIPLE
a subject, a predicate, and an object – defines relationship between two things (entity-relationship model)
QUAD
a subject, a predicate, an object, and context Allows relationships to have attributes – or information about the relationship
Assertion: one way to represent “trust” or “authenticate” the relationship; “who” or “what” established this relationship?
ALL of the pieces of both TRIPLES and QUADS can be URIs; any piece that is not a URI is a “dead-end” or literal/string
SLIDE 10: Linked Open Data [mug]Linked data is not guaranteed to be OPEN; you can have linked data in a closed networkHowever, OPEN is preferred
Why?
Tim Berners-Lee principles of linked data
1. Use URIs to name (identify) things.2. Use HTTP URIs so that these things can be looked up (interpreted, "dereferenced").3. Provide useful information about what a name identifies when it's looked up, using open
standards such as RDF, SPARQL, etc.4. Refer to other things using their HTTP URI-based names when publishing data on the Web – e.g.
link to other URIs so more things can be discovered.5. OPEN data/content
But what makes it OPEN?OPEN data – “star” system – goal is 5 stars
1. Available on the web (whatever format) but with an open license, to be Open Data2. Available as machine-readable structured data (e.g. excel instead of image scan of a table)3. as (2) plus non-proprietary format (e.g. CSV instead of excel)4. Use open standards from W3C (RDF and SPARQL) to identify things, so that people can point at
your stuff5. Link your data to other people’s data to provide context
Source: W3C (https://www.w3.org/DesignIssues/LinkedData.html)
SLIDE 11: Linked Open Data Cloudhttp://lod-cloud.net/Aug. 2014CC-BY-SA license
Representation of the Linked Open Data cloud – still growing as more data sets are published
Each node is a dataset that can be queried, linked to, and extracted from
Center of the hub is DBPediaconnects other data sources – interlinking published data sets
Upper right – coded green – library dataid.loc.govVIAFWorldCat
national libraries – French, German, and moregeographic data
SLIDE 12: BIBFRAMEBibliographic Framework Initiative Initiative defined in 2011Source: https://www.loc.gov/bibframe/faqs/Releasing our data from the MARC silo, but beyond that too
SLIDE 13:INDEPENDENT of any specific description standard (e.g. RDA or Resource Description and Access)Is it replacing MARC? Yes, but also going beyond MARC
Focused on RELATIONSHIPS
Can look at it as: “blowing up” a MARC record into its discrete elements and defining their relationship with each other and beyond (connect with elements not traditionally included)
Classes: define a resource (under BIBFRAME)Properties: further describe a resource
SLIDE 14: BIBFRAME 2.0 Model
Work – conceptual essence Instance – embodiment of a work (one work can have multiple instances) Item – copy of the instance – local holdings
Agents – people, organizations, jurisdictions, etc. associated with a work Subjects – “aboutness” of a work Events – occurrences, recording of a content of a work
SLIDE 15: All the Linked Data Activity!Review the current initiatives/activities
definitions and the active participants;more detail about the various activities will be include in the presentations todayNOT an exhaustive list
Currently largely independent of each otherWhy? No shared triple store; infrastructure is a barrier for larger cooperative participation
LODLAM: linked open data in libraries, archives, and museumso “informal, borderless network of enthusiasts, technicians, professionals and any number of
other people who are interested in or working with Linked Open Data pertaining to galleries, libraries, archives, and museums”
o http://lodlam.net/ BIBFRAME: initiative for bibliographic data
o Library of Congresso https://www.loc.gov/bibframe/ (bibframe.org LOC site now)
BIBFRAME Lite: modular and layered vocabulary approach using BIBFRAME vocabularyo core set of classes/properties as a scaffolding; build on it with other vocabularies (Library,
Archive, Rare Materials, etc.)o National Library of Medicine; George Washington U; Zepheirao http://www.bibfra.me/
LD4PE: Linked Data for Professional Educatorso educate the educators; building an Exploratorium of learning resources and defined
competencieso University of WA; Kent State U; Dublin Core Metadata Initiative (DCMI); Sungkyunkwan
University (Korea); OCLC; Elsevier; Synapticao http://wiki.dublincore.org/index.php/Pet/ld4pe
LD4L: Linked Data for Librarieso 2014-2016 Mellon Grant ($1 million)o Cornell U Library; Harvard Library; Stanford U Librarieso https://wiki.duraspace.org/pages/viewpage.action?pageId=41354028
LD4L Labs: continue LD4Lo “$1.5 million dollar grant from the Andrew W. Mellon Foundation, Linked Data for Libraries:
LD4L Labs is a collaboration of Cornell, Harvard, Iowa, and Stanford to continue to advance the use and usefulness of linked data in libraries”
o https://wiki.duraspace.org/display/ld4l/LD4L+Labs LD4P: Linked Data for Production
o LC; Columbia; Cornell; Harvard; Princeton; Stanfordo Multiple projects- each of the 6 core members contribute projectso https://wiki.duraspace.org/pages/viewpage.action?pageId=74515029o http://www.loc.gov/aba/pcc/documents/PCC-LD4P.docx
BIBFLOW: “Reinventing Cataloging: Models for the Future of Library Operations”o focus is on workflowso UC Davis Library; Zepheirao https://www.lib.ucdavis.edu/bibflow/
LibHub: converting library MARC data to BIBFRAME and linked data formats; publishing and hosting the resulting content
o Zepheirao http://www.libhub.org/
CLDI: Canadian Linked Data Initiativeo U of Toronto; McGill U; Universite de Montreal; U of Alberta; U of British Columbia; Library
& Archives Canada; Bibliothèque et Archives nationales du Québec; Canadianao https://connect.library.utoronto.ca/display/U5LD/Canadian+Linked+Data+Initiative+Home
LC Linked Data Serviceo provides access to commonly found standards and vocabularies promulgated by the Library
of Congresso http://id.loc.gov/
OCLC: multiple initiatives (W3C Linked Data Platform; BIBFRAME; Schema.org; Schema.org Extend W3C Group; OCLC Works)
o WorldCat in Schema.orgo FAST: Faceted Application of Subject Terminology; http://fast.oclc.org/o VIAF: Virtual International Authority File; http://viaf.org/
o WorldCat Entities (or Works); https://www.oclc.org/developer/develop/linked-data/worldcat-entities/worldcat-work-entity.en.html
o https://www.oclc.org/en-US/data.htmlo https://www.oclc.org/developer/develop/linked-data.en.html
ADDENDUM:
Library Link Networko Zepheirao “Seeding the Web with Library locations, services, and content.”o http://library.link/
Linked Data Collaboration Programo Ex Libriso Jan. 2016 press release; involves 30+ institutionso http://www.exlibrisgroup.com/category/LinkedDataDiscussionPapero http://www.exlibrisgroup.com/files/Publications/LinkedDataattheServiceofLibraries.pdf
IGELU/ELUNA Special Interest Group on Linked Open Datao “achieve essential linked open data features in all Ex Libris products where appropriate,
both from the data publishing, the data consuming and the data integration perspective.”o http://igelu.org/special-interests/lod
Blue Cloud Visibilityo SirsiDynixo service to extract MARC records and transform into BIBFRAME; enhanced with geographic
datao http://www.sirsidynix.com/products/bluecloud-visibility
SLIDE 16: [business cat image]Questions?“business cat” memehttp://es.memegenerator.net/Business-Cat
SLIDE 17: [xkcd]https://xkcd.com/262/ALT TEXT: “hey, at least I ran out of staples”