moving library metadata toward linked data: opportunities provided by the extensible catalog

46
Jennifer Bowen, University of Rochester DC-2010 Conference October 20, 2010, Pittsburgh, PA Moving Library Metadata toward Linked Data: Opportunities Provided by the eXtensible Catalog

Upload: jennifer-bowen

Post on 17-May-2015

2.530 views

Category:

Technology


0 download

DESCRIPTION

Presented at DCMI-2010, a conference of the Dublin Core Metadata Initiative, in Pittsburgh, PA, on October 20, 2010

TRANSCRIPT

Page 1: Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eXtensible Catalog

Jennifer Bowen, University of RochesterDC-2010 Conference October 20, 2010, Pittsburgh, PA

Moving Library Metadatatoward Linked Data: Opportunities Provided by the eXtensible Catalog

Page 2: Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eXtensible Catalog

About me…

Currently:- Librarian- Technical services administrator- Software development team co-leader

Formerly:- Cataloger (MARC)- Standards developer (RDA)

Maybe someday…Linked Data Expert?

2

Page 3: Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eXtensible Catalog

My Topics Today

3

Is it feasible to turn legacy library MARC metadata into Linked Data in an automated environment,

and,How can eXtensible Catalog (XC) software play a role in that process?

Image source: www.blog.kdl.org

Page 4: Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eXtensible Catalog

Semantic Web and Linked Data

Semantic Web: a set of technologies that allow computers to understand the meaning of information on the web

Linked Data: a mechanism for exposing, sharing and connecting data on the web, using identifiers and relationships

4

Page 5: Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eXtensible Catalog

Linked Data “Expectations of Behavior”

– Use URIs as names for things – Use HTTP URIs so that people can look up

those names. – When someone looks up a URI, provide useful

information, using the standards (RDF*, SPARQL)

– Include links to other URIs so that they can discover more things.

Tim Berners-Lee, “Design issues”, 2006 http://www.w3.org/DesignIssues/LinkedData.html

5

Page 6: Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eXtensible Catalog

Linked Data: RDF triple

6

This presentation Jennifer Bowen

has creator

ObjectPredicateSubject

Page 7: Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eXtensible Catalog

“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”

A Reality Check

7

Page 8: Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eXtensible Catalog

Teaching MARC metadata new tricks?

8

Image source: http://www.englishcafe.com/node/2337

Page 9: Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eXtensible Catalog

Turning legacy data into Linked Data…

How do we even get started?

9

Page 10: Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eXtensible Catalog

Getting Started

To create Linked Data, we need:–Software to transform legacy data–Analysis: mapping of legacy metadata to

Linked Data properties

10

Page 11: Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eXtensible Catalog

The software…

11

eXtensible Catalog (XC) is open source, user-centered, next generation software for libraries.

XC provides a discovery system and a set of tools for libraries to manage metadata and build applications.

Page 12: Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eXtensible Catalog

XC Software Components

User Interface Website on Drupal CMS

Integrated Library System Repository

XC User Interface

Metadata Processing Metadata Services Toolkit

Connectivity tools NCIPToolkit

12

OAI Toolkit

Page 13: Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eXtensible Catalog

XC’s original metadata goals

- Aggregate MARC and other metadata for use in new applications

- Define a FRBR-based metadata schema to support XC’s user-interface functionality

- Create a software application to process batches of metadata through a set of services

13

Page 14: Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eXtensible Catalog

Software development:a moving target!

14

Page 15: Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eXtensible Catalog

XC and Linked Data

How can XC help move legacy library metadata closer to Linked Data?

NOT among XC’s original goals

However, XC software creates an opportunity to contribute to this effort and provides important “lessons learned”

15

Page 16: Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eXtensible Catalog

Converting MARC to Linked Data

What XC software can do:– Convert MARC codes to vocabulary values– Remove extraneous data– Normalize inconsistencies– Map most MARC fields/subfields and parse to

appropriate FRBR Group 1 entity records

16

Page 17: Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eXtensible Catalog

Converting MARC to Linked Data

Problematic areas:– Some MARC fields/subfields are difficult to

map to appropriate FRBR entities – Tracking relationships between FRBR entity

records: How many relationships can we support with XC software?

17

Page 18: Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eXtensible Catalog

MARC to XC Schema Transformation

Parses MARCXML records into linked FRBR-based records Maps MARCXML data

elements to Linked-Data-Compatible elements in the XC Schema.

Page 19: Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eXtensible Catalog

Managing Relationships

Page 20: Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eXtensible Catalog

Managing Relationships

20

Page 21: Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eXtensible Catalog

Issue: Managing Multiple Relationships

21

MARC bibliographic records can refer to multiple FRBR entities of the same type (analytics that represent multiple works/expressions, e.g. tracks on a CD)

Page 22: Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eXtensible Catalog

Issue: Beyond FRBR Group 1 Entities

22

MARC “Alternate Graphic Representation” (880 fields) can contain data that belong in records for Group 2 and Group 3 entities

Contributor:700 1    ‡6 880‐08 ‡a Vasil’ev, Maksim.880 1    ‡6 700‐08 ‡a Васильев, Максим.

Subject:600 10 ‡6 880‐06 ‡a Putin, Vladimir Vladimirovich, ‡d 1952‐880 10 ‡6 600‐06 ‡a Путин, Владимир Владимирович, ‡d 1952‐

Page 23: Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eXtensible Catalog

If we were to parse this 880 data correctly:

23

Alternative script of

name from 880

Alternative script of subject

from 880

Page 24: Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eXtensible Catalog

Issue: Related Group 1 Entities

Language attribute for a related expression

041  1    ‡a eng ‡h ita100  0    ‡a Dante Alighieri, ‡d 1265‐1321.240  10 ‡a Divina commedia. ‡l English245  14 ‡a The divine comedy / ‡c Dante ; a     new verse translation by C.H. Sisson.

500        ‡a Translation of: Divina commedia.

24

Page 25: Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eXtensible Catalog

If we were to parse 041 ‡h data…

25

Alternative script of

name from 880

Original language from

041 ‡h

Alternative script of subject

from 880

Page 26: Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eXtensible Catalog

Managing Relationships Between Entities

26

Original language from

041 $h

Alternative script of subject

from 880

Alternative script of

name from 880

Page 27: Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eXtensible Catalog

•new records•changed records•deleted records•changed relationships

Maintaining links between separate FRBR entity records in a production environment monopolizes system resources and may not be scalable.

What we are learning from XC

27

Page 28: Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eXtensible Catalog

28

But wait…

If we can map a MARC data element to a FRBR entity, we can probably convert it to Linked Data.

What does this emphasis on FRBR have to do with Linked Data?

FRBR Group 1 Entities

Page 29: Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eXtensible Catalog

29

But do we have to?

- Do we have to be able to map MARC elements to a FRBR entity in order to create Linked Data?

- Would managing RDF triples be more scalable than managing FRBR-based records and the relationships between those records?

Page 30: Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eXtensible Catalog

Best Practices for Linked Data

- Unique identifiers for XC metadata records

- Data elements from registered schemas- Registered vocabularies

30

By attempting to follow best practices in XC for Linked Data, we hope to facilitate eventual output of XC metadata in RDF.

Page 31: Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eXtensible Catalog

RDF Triple

31

This resource Poets, American

has subject

ObjectPredicateSubject

URIs for each?

Page 32: Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eXtensible Catalog

RDF Triple – Record identifiers

32

ObjectPredicateSubject

oai:mst.rochester.edu: MST/MARCToXCTransformation/10081

This resource has subject Poets, American

Page 33: Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eXtensible Catalog

Identifiers for XC Schema records

33

<?xml version="1.0" encoding="UTF-8"?><xc:frbr xmlns:xc="http://www.extensiblecatalog.info/Elements" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:rdvocab="http://rdvocab.info/Elements" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:rdarole="http://rdvocab.info/roles"><xc:entity type="work" id="oai:mst.rochester.edu:MST/MARCToXCTransformation/10081"><dcterms:subject xsi:type="dcterms:LCC">PS3505.U334</dcterms:subject><dcterms:subject xsi:type="dcterms:DDC">811/.52</dcterms:subject><dcterms:subject xsi:type="dcterms:DDC">B</dcterms:subject><rdarole:author>Sawyer-Lauc<U+0327>anno, Christopher, 1951-</rdarole:author><rdvocab:titleOfTheWork>E.E. Cummings :</rdvocab:titleOfTheWork><xc:subject xsi:type="dcterms:LCSH">Cummings, E. E. (Edward Estlin), 1894-1962.</xc:subject><xc:subject xsi:type="dcterms:LCSH">Poets, American-20th century-Biography.</xc:subject></xc:entity></xc:frbr> A persistent, globally unique identifier

for each XC Schema record

Page 34: Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eXtensible Catalog

RDF Triple - Registered Data Elements

34

http://www.extensiblecatalog.info

/Elements/subject

ObjectPredicateSubject

oai:mst.rochester.edu: MST/MARCToXCTransformation/10081

This resource has subject Poets, American

Page 35: Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eXtensible Catalog

35

DCMI

Page 36: Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eXtensible Catalog

36

RDA

Page 37: Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eXtensible Catalog

37

XC

Page 38: Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eXtensible Catalog

XC Schema “work” record: data elements

38

<?xml version="1.0" encoding="UTF-8"?><xc:frbr xmlns:xc="http://www.extensiblecatalog.info/Elements" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:rdvocab="http://rdvocab.info/Elements" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:rdarole="http://rdvocab.info/roles"><xc:entity type="work" id="oai:mst.rochester.edu:MST/MARCToXCTransformation/10081"><dcterms:subject xsi:type="dcterms:LCC">PS3505.U334</dcterms:subject><dcterms:subject xsi:type="dcterms:DDC">811/.52</dcterms:subject><dcterms:subject xsi:type="dcterms:DDC">B</dcterms:subject><rdarole:author>Sawyer-Lauc<U+0327>anno, Christopher, 1951-</rdarole:author><rdvocab:titleOfTheWork>E.E. Cummings :</rdvocab:titleOfTheWork><xc:subject xsi:type="dcterms:LCSH">Cummings, E. E. (Edward Estlin), 1894-1962.</xc:subject><xc:subject xsi:type="dcterms:LCSH">Poets, American-20th century-Biography.</xc:subject></xc:entity></xc:frbr> Data elements from registered

namespaces for DC terms, RDA roles and vocab, and XC

Page 39: Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eXtensible Catalog

RDF Triple - Registered Vocabularies

39

http://id.loc.gov/authorities/sh85103735#concept

http://www.extensiblecatalog.info

/Elements/subject

ObjectPredicateSubject

oai:mst.rochester.edu: MST/MARCToXCTransformation/10081

This resource has subject Poets, American

Page 40: Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eXtensible Catalog

40

<?xml version="1.0" encoding="UTF-8"?><xc:frbr xmlns:xc="http://www.extensiblecatalog.info/Elements" … xmlns:subjid=“id.loc.gov/authorities”><xc:entity type="work" id="oai:mst.rochester.edu:MST/MARCToXCTransformation/10081">…<xc:subject xsi:type="dcterms:LCSH">Poets, American-20th century-Biography.</xc:subject><xc:subject xsi:type="dcterms:LCSH” subjid=“sh85103735#concept”>Poets, American</xc:subject><xc:temporal>20th century</xc:temporal><xc:type>Biography</xc:type></xc:entity>

XC Work record with embedded URI for LCSH “Poets, American”

Page 41: Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eXtensible Catalog

RDF Triple

41

http://id.loc.gov/authorities/sh85103735#concept

http://www.extensiblecatalog.info

/Elements/subject

ObjectPredicateSubject

oai:mst.rochester.edu: MST/MARCToXCTransformation/10081

This resource has subject Poets, American

Page 42: Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eXtensible Catalog

Experimenting with Linked Data

- Within a MARC or MARCXML environment? - Possible to give each record a

URI- MARC elements themselves

don’t have URIs- How to embed multiple URIs for

registered vocabularies in MARC?

42

- XC enables experimentation outside of a MARC environment with data that originated as MARC

Page 43: Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eXtensible Catalog

Making Linked Data a Priority for XC

– Balancing goals– Time/funding constraints– What’s our use case?– Output of Linked Data from XC vs.– Using Linked Data within XC?

43

Page 44: Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eXtensible Catalog

XC Linked Data Accomplishments

XC has set the stage for Linked Data by:- Providing a platform for creating Linked Data

using XC software - Ensuring that XC Schema records can be

converted to RDF triples as easily as possible- Enabling others to build upon what we have

accomplished done so far.

44

Page 45: Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eXtensible Catalog

Next Steps

- Monitor RDA implementations- Develop XC authority control service- Enable RDF output of XC Schema metadata- Encourage libraries to use XC software and

contribute to the XC user community - Seek funding for additional software

development

45

Page 46: Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eXtensible Catalog

www.eXtensiblecatalog.org

Jennifer [email protected]

Thank you! Questions?