lodlam landscape notes

9
NISO Virtual Conference: BIBFRAME & Real World Applications of Linked Bibliographic Data http://www.niso.org/news/events/2016/virtual_conference/ jun15_virtualconf/ June 15, 2016 Keynote: Landscape and Current Status of BIBFRAME and Related Initiatives “The LODLAM Landscape” SLIDE 1: The LODLAM Landscape Subtitle: BIBFRAME & Other Linked Data Initiatives SLIDE 2: Why LODLAM? First, a definition: LODLAM: linked open data in libraries, archives, and museums Libraries – Libraries, Archives, Museums (e.g. cultural heritage institutions) Why is Linked Data important? Libraries – trusted repositories of information; the data we have is rich and deep Linked data – how we can share that information on the web; making things more discoverable and accessible So what’s the problem?! We’ve got this data, so why isn’t it just thrown out there? Our data is siloed in MARC MARC we’ve been making it do something it wasn’t designed to do for 20+ years now designed to purely transmit data for printing; NOT for being indexed, searched, manipulated, etc. MARC is contextual (hello, ISBD) – machines don’t do context, they are dumb and literal Henriette Avram – one of my heroes (sheroes) – BUT not a storage/retrieval format; for transmitting and display only BIBFRAME is one effort to break that silo.

Upload: shana-mcdanold

Post on 14-Apr-2017

204 views

Category:

Education


0 download

TRANSCRIPT

Page 1: LODLAM Landscape NOTES

NISO Virtual Conference: BIBFRAME & Real World Applications of Linked Bibliographic Datahttp://www.niso.org/news/events/2016/virtual_conference/jun15_virtualconf/June 15, 2016

Keynote: Landscape and Current Status of BIBFRAME and Related Initiatives“The LODLAM Landscape”

SLIDE 1: The LODLAM LandscapeSubtitle: BIBFRAME & Other Linked Data Initiatives

SLIDE 2: Why LODLAM?First, a definition:LODLAM: linked open data in libraries, archives, and museumsLibraries – Libraries, Archives, Museums (e.g. cultural heritage institutions)Why is Linked Data important?

Libraries – trusted repositories of information; the data we have is rich and deep

Linked data – how we can share that information on the web; making things more discoverable and accessible

So what’s the problem?! We’ve got this data, so why isn’t it just thrown out there?

Our data is siloed in MARCMARC

we’ve been making it do something it wasn’t designed to do for 20+ years now designed to purely transmit data for printing; NOT for being indexed, searched, manipulated, etc. MARC is contextual (hello, ISBD) – machines don’t do context, they are dumb and literal Henriette Avram – one of my heroes (sheroes) – BUT not a storage/retrieval format; for transmitting

and display only

BIBFRAME is one effort to break that silo.

SLIDE 3: Terms and DefinitionsSemantic web: “W3C, "The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries“”

Tim Berners-Lee – defined the semantic web/linked data

Semantic web is powered by Linked Data

Linked Data is structured data that is machine readable/actionable

SLIDE 4: [CATS]Of course, we all know the internet (aka the web) is actually made of a series of tubes filled with CATS, not data.

Page 2: LODLAM Landscape NOTES

SourcesSubversive Cross StitchPusheen.comNyan catThreadless – “Voltron” of cats

SLIDE 5: [xkcd]Well, OK. It’s not really cats in tubes. The web is actually a bunch of standards – that keep proliferating – but must be interoperablehttp://xkcd.com/927/Alt text: charging issue solved. Wait, is it: mini-USB? Micro-USB?

SLIDE 6: (Semantic Web)Semantic web – web of linked data

powered by rules/standards and technologies that provide structure to data

*Various (most commonly used/heard) standards and organizations that power the Semantic Web; most used as part of or by linked data

W3C: World Wide Web Consortium URI: Uniform resource identifier (a URL is one type of URI, also have URNs) HTTP: hypertext transfer protocol – allows a URI to be actionable (“linkable” or “clickable”) XML/HTML: mark-up languages Microdata: nest metadata within existing web page content (embedded in HTML) JSON: JaveScript Object Notation – data format for transmitting data objects (attribute-value pairs);

JSON-LD – method of encoding linked data using JSON RDF: Resource Description Framework – conceptual description or modeling of information used in

web resources; multiple RDF specifications (entity-relationship); most other semantic web standards are built on or use RDF

Turtle: Terse RDF Triple Language SPARQL: semantic query language (uses RDF); endpoint for access SKOS: simple knowledge organization system (for controlled vocabulary representation) OWL: web ontology language (knowledge representation language for ontologies

(vocabularies/taxonomies)) FOAF: “friend of a friend” (ontology for describing persons) Schema.org: Bing/Google/Yahoo - schemas for structured data markup on web pages

SLIDE 7: (Semantic) Web: LODLAM SubsetOther common terms and organizations within the LODLAM community:

BIBFRAME: data model for bibliographic description using linked data principles Blacklight: discovery interface for Solr index;

o http://projectblacklight.org/ Solr: open source standalone enterprise search server; powerful indexing

o http://lucene.apache.org/solr/ Fedora: Flexible Extensible Digital Object Repository Architecture; open source architecture for

storing, managing, and accessing digital content in the form of digital objects; o http://fedora-commons.org/

Page 3: LODLAM Landscape NOTES

Hydra: repository solution; “Hydra is an ecosystem of components that lets institutions deploy robust and durable digital repositories (the body) supporting multiple “heads”: fully-featured digital asset management applications and tailored workflows. Its principle platforms are the Fedora Commons repository software, Solr, Ruby on Rails and Blacklight”;

o http://projecthydra.org/ ORCID: open, non-profit, community-driven effort to create and maintain a registry of unique

researcher identifiers; o http://orcid.org/

VIVO: member-supported, open source software and an ontology for representing scholarship; o http://vivoweb.org/

PLUS existing controlled vocabularies such as LCSH, LCClassification, Dewey, etc.

SLIDE 8: Linked DataPowers the semantic webSTRUCTURED data

can be queried, linked to/from, and integrated builds on existing web technologies machine (not human) friendly – actionable by machine uses controlled vocabularies structured data stored in various interoperable “containers” (mark-up, schemas, etc.)

Structure of linked data: triples:

a subject, a predicate, and an object – defines relationship between two things (entity-relationship model)

Best case: each element of a triple is a URI, then each element can connect with other elements

URI – dereferencable (actionable/clickable)

Controlled vocabularies and identities are “things” represented by URIs – multiple sources will “connect through” one URI, providing the “web of data”

Things: represented by URIs; they are actionable and connect to other things/strings Strings: text; also called literals; “dead ends” in the semantic web/linked data

SLIDE 9: Triples and QuadsTRIPLE

a subject, a predicate, and an object – defines relationship between two things (entity-relationship model)

QUAD

a subject, a predicate, an object, and context Allows relationships to have attributes – or information about the relationship

Page 4: LODLAM Landscape NOTES

Assertion: one way to represent “trust” or “authenticate” the relationship; “who” or “what” established this relationship?

ALL of the pieces of both TRIPLES and QUADS can be URIs; any piece that is not a URI is a “dead-end” or literal/string

SLIDE 10: Linked Open Data [mug]Linked data is not guaranteed to be OPEN; you can have linked data in a closed networkHowever, OPEN is preferred

Why?

Tim Berners-Lee principles of linked data

1. Use URIs to name (identify) things.2. Use HTTP URIs so that these things can be looked up (interpreted, "dereferenced").3. Provide useful information about what a name identifies when it's looked up, using open

standards such as RDF, SPARQL, etc.4. Refer to other things using their HTTP URI-based names when publishing data on the Web – e.g.

link to other URIs so more things can be discovered.5. OPEN data/content

But what makes it OPEN?OPEN data – “star” system – goal is 5 stars

1. Available on the web (whatever format) but with an open license, to be Open Data2. Available as machine-readable structured data (e.g. excel instead of image scan of a table)3. as (2) plus non-proprietary format (e.g. CSV instead of excel)4. Use open standards from W3C (RDF and SPARQL) to identify things, so that people can point at

your stuff5. Link your data to other people’s data to provide context

Source: W3C (https://www.w3.org/DesignIssues/LinkedData.html)

SLIDE 11: Linked Open Data Cloudhttp://lod-cloud.net/Aug. 2014CC-BY-SA license

Representation of the Linked Open Data cloud – still growing as more data sets are published

Each node is a dataset that can be queried, linked to, and extracted from

Center of the hub is DBPediaconnects other data sources – interlinking published data sets

Upper right – coded green – library dataid.loc.govVIAFWorldCat

Page 5: LODLAM Landscape NOTES

national libraries – French, German, and moregeographic data

SLIDE 12: BIBFRAMEBibliographic Framework Initiative Initiative defined in 2011Source: https://www.loc.gov/bibframe/faqs/Releasing our data from the MARC silo, but beyond that too

SLIDE 13:INDEPENDENT of any specific description standard (e.g. RDA or Resource Description and Access)Is it replacing MARC? Yes, but also going beyond MARC

Focused on RELATIONSHIPS

Can look at it as: “blowing up” a MARC record into its discrete elements and defining their relationship with each other and beyond (connect with elements not traditionally included)

Classes: define a resource (under BIBFRAME)Properties: further describe a resource

SLIDE 14: BIBFRAME 2.0 Model

Work – conceptual essence Instance – embodiment of a work (one work can have multiple instances) Item – copy of the instance – local holdings

Agents – people, organizations, jurisdictions, etc. associated with a work Subjects – “aboutness” of a work Events – occurrences, recording of a content of a work

SLIDE 15: All the Linked Data Activity!Review the current initiatives/activities

definitions and the active participants;more detail about the various activities will be include in the presentations todayNOT an exhaustive list

Currently largely independent of each otherWhy? No shared triple store; infrastructure is a barrier for larger cooperative participation

LODLAM: linked open data in libraries, archives, and museumso “informal, borderless network of enthusiasts, technicians, professionals and any number of

other people who are interested in or working with Linked Open Data pertaining to galleries, libraries, archives, and museums”

o http://lodlam.net/ BIBFRAME: initiative for bibliographic data

o Library of Congresso https://www.loc.gov/bibframe/ (bibframe.org LOC site now)

Page 6: LODLAM Landscape NOTES

BIBFRAME Lite: modular and layered vocabulary approach using BIBFRAME vocabularyo core set of classes/properties as a scaffolding; build on it with other vocabularies (Library,

Archive, Rare Materials, etc.)o National Library of Medicine; George Washington U; Zepheirao http://www.bibfra.me/

LD4PE: Linked Data for Professional Educatorso educate the educators; building an Exploratorium of learning resources and defined

competencieso University of WA; Kent State U; Dublin Core Metadata Initiative (DCMI); Sungkyunkwan

University (Korea); OCLC; Elsevier; Synapticao http://wiki.dublincore.org/index.php/Pet/ld4pe

LD4L: Linked Data for Librarieso 2014-2016 Mellon Grant ($1 million)o Cornell U Library; Harvard Library; Stanford U Librarieso https://wiki.duraspace.org/pages/viewpage.action?pageId=41354028

LD4L Labs: continue LD4Lo “$1.5 million dollar grant from the Andrew W. Mellon Foundation, Linked Data for Libraries:

LD4L Labs is a collaboration of Cornell, Harvard, Iowa, and Stanford to continue to advance the use and usefulness of linked data in libraries”

o https://wiki.duraspace.org/display/ld4l/LD4L+Labs LD4P: Linked Data for Production

o LC; Columbia; Cornell; Harvard; Princeton; Stanfordo Multiple projects- each of the 6 core members contribute projectso https://wiki.duraspace.org/pages/viewpage.action?pageId=74515029o http://www.loc.gov/aba/pcc/documents/PCC-LD4P.docx

BIBFLOW: “Reinventing Cataloging: Models for the Future of Library Operations”o focus is on workflowso UC Davis Library; Zepheirao https://www.lib.ucdavis.edu/bibflow/

LibHub: converting library MARC data to BIBFRAME and linked data formats; publishing and hosting the resulting content

o Zepheirao http://www.libhub.org/

CLDI: Canadian Linked Data Initiativeo U of Toronto; McGill U; Universite de Montreal; U of Alberta; U of British Columbia; Library

& Archives Canada; Bibliothèque et Archives nationales du Québec; Canadianao https://connect.library.utoronto.ca/display/U5LD/Canadian+Linked+Data+Initiative+Home

LC Linked Data Serviceo provides access to commonly found standards and vocabularies promulgated by the Library

of Congresso http://id.loc.gov/

OCLC: multiple initiatives (W3C Linked Data Platform; BIBFRAME; Schema.org; Schema.org Extend W3C Group; OCLC Works)

o WorldCat in Schema.orgo FAST: Faceted Application of Subject Terminology; http://fast.oclc.org/o VIAF: Virtual International Authority File; http://viaf.org/

Page 7: LODLAM Landscape NOTES

o WorldCat Entities (or Works); https://www.oclc.org/developer/develop/linked-data/worldcat-entities/worldcat-work-entity.en.html

o https://www.oclc.org/en-US/data.htmlo https://www.oclc.org/developer/develop/linked-data.en.html

ADDENDUM:

Library Link Networko Zepheirao “Seeding the Web with Library locations, services, and content.”o http://library.link/

Linked Data Collaboration Programo Ex Libriso Jan. 2016 press release; involves 30+ institutionso http://www.exlibrisgroup.com/category/LinkedDataDiscussionPapero http://www.exlibrisgroup.com/files/Publications/LinkedDataattheServiceofLibraries.pdf

IGELU/ELUNA Special Interest Group on Linked Open Datao “achieve essential linked open data features in all Ex Libris products where appropriate,

both from the data publishing, the data consuming and the data integration perspective.”o http://igelu.org/special-interests/lod

Blue Cloud Visibilityo SirsiDynixo service to extract MARC records and transform into BIBFRAME; enhanced with geographic

datao http://www.sirsidynix.com/products/bluecloud-visibility

SLIDE 16: [business cat image]Questions?“business cat” memehttp://es.memegenerator.net/Business-Cat

SLIDE 17: [xkcd]https://xkcd.com/262/ALT TEXT: “hey, at least I ran out of staples”