bibliographic data in the semantic web – what issues do we face in getting it there? gordon...

13
Bibliographic data in the Semantic Web – what issues do we face in getting it there? Gordon Dunsire Presented to the ALCTS Cataloging and Classification Section Executive Committee Forum, ALA Annual, 24 June 2011

Upload: william-lancaster

Post on 27-Mar-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bibliographic data in the Semantic Web – what issues do we face in getting it there? Gordon Dunsire Presented to the ALCTS Cataloging and Classification

Bibliographic data in the Semantic Web – what issues do we face in

getting it there?

Gordon DunsirePresented to the ALCTS Cataloging and

Classification Section Executive Committee Forum, ALA Annual, 24 June 2011

Page 2: Bibliographic data in the Semantic Web – what issues do we face in getting it there? Gordon Dunsire Presented to the ALCTS Cataloging and Classification

Overview

Introduction to linked data and the Semantic Web

From record to statement: a paradigm shiftSome issues

Page 3: Bibliographic data in the Semantic Web – what issues do we face in getting it there? Gordon Dunsire Presented to the ALCTS Cataloging and Classification

Linked data and RDF

Resource Description Framework (RDF)Designed for machine-processing of metadata

at global scale (Semantic Web)24/7/365Trillions of operations per second

Everything must be dis-ambiguatedMachines are dumb

Simplicity helps!Machine-readable identifiers

Page 4: Bibliographic data in the Semantic Web – what issues do we face in getting it there? Gordon Dunsire Presented to the ALCTS Cataloging and Classification

RDF tripleMetadata expressed as “atomic” statements

A simple, single, irreducible statementThe title of this book is “Cataloguing is fun!”

Constructed in 3 parts“Triple”

The title of this book is “Cataloguing is fun!”Subject of the statement = Subject: This bookNature of the statement = Predicate: has titleValue of the statement = Object: “Cataloguing is fun!”

This book – has title – “Cataloguing is fun!”subject – predicate - object

Page 5: Bibliographic data in the Semantic Web – what issues do we face in getting it there? Gordon Dunsire Presented to the ALCTS Cataloging and Classification

Machine-readable identifiersUniform Resource Identifier (URI)

Can be any unique combination of numbers and lettersNo intrinsic meaning; it’s just an identifier

Can look like a URL“Cool” URI: exploits existing processes developed for the

World-Wide Webhttp://iflastandards.info/ns/isbd/elements/P1001But does not lead to a Web page (in principle ...)

RDF requires the subject and predicate of triple to be URIsObject can be a URI, or a literal string (“Cataloguing is

fun!”)

Page 6: Bibliographic data in the Semantic Web – what issues do we face in getting it there? Gordon Dunsire Presented to the ALCTS Cataloging and Classification

Title: Cataloguing is fun!

Author: Mary MacDonald

Content type:

Media type:

LCSH:

microform

text

Cataloging

Bibliographic record: 12345

b12345 Author “Mary MacDonald”

b12345 Title “Cataloguing is fun!”

b12345 Content type “text”

b12345 Media type “microform

b12345 LCSH “Cataloging”

subject predicate object

Name authority record: 8765

Heading: MacDonald, Mary

n8765 Heading “MacDonald, Mary”

n8765

t1234 Preferred label “microform”

t1234

lc1234

Heading “Cataloging”lc1234Preferred label “text”t9876

t9876

Page 7: Bibliographic data in the Semantic Web – what issues do we face in getting it there? Gordon Dunsire Presented to the ALCTS Cataloging and Classification

Identifiers for propertiesPredicates are known as properties in RDF

http://iflastandards.info/ns/isbd/elements/P1004“has key title”

Properties can be mixed’n’matchedChosen from different sources (element sets)

Different element sets contain similar propertieshttp://RDVocab.info/Elements/keyTitleManifestation

“Key title (Manifestation) ”

Some element sets are not available in RDFE.g. MARC21

Page 8: Bibliographic data in the Semantic Web – what issues do we face in getting it there? Gordon Dunsire Presented to the ALCTS Cataloging and Classification

Choosing properties/URIs for legacy recordsClosest inclusive meaning

Minimises information lossCheck the definition

ISBD’s “has title proper” better than Dublin Core’s “title” (a name given to the resource.)

Check other semantic constraintsRDA’s “titleManifestation” implies a triple’s

subject URI is a ManifestationNo good for non-FRBRized records

Page 9: Bibliographic data in the Semantic Web – what issues do we face in getting it there? Gordon Dunsire Presented to the ALCTS Cataloging and Classification

Metadata rights

Potential legal minefieldMultiple agencies contributing to one record

Anxiety that “others” may use open triples to build rival, competitive services

Main rights associated with the record?i.e. As an aggregation of triples

Can a triple be copyrighted if component URIs are openly published?

Page 10: Bibliographic data in the Semantic Web – what issues do we face in getting it there? Gordon Dunsire Presented to the ALCTS Cataloging and Classification

“Minting” URIs for resourcesSpecific subject of a triple

Mainly bibliographic resourcesURIs for Persons, Places, etc. taken from RDF “authorities”

FRBRized records need separate URI for the Work, Expression, Manifestion, (Item)

“Standard” identifiers only a partial solutionISBN, ISSN, national bibliography numbers, etc.

Risk of different agencies creating different URIs for the same resourceInefficient, and costly to maintain namespaces

Page 11: Bibliographic data in the Semantic Web – what issues do we face in getting it there? Gordon Dunsire Presented to the ALCTS Cataloging and Classification

Other costsProviding access to triples

Data-dump, triple store, data query (SPARQL)URIs should last forever

Preservation and archive regime requiredDe-referencing services

Providing human- and machine-readable information about a URI

Cost of re-engineering systems, re-designing interfaces, re-training cataloguers ...But long-term benefits will justify the investment

Page 12: Bibliographic data in the Semantic Web – what issues do we face in getting it there? Gordon Dunsire Presented to the ALCTS Cataloging and Classification

The Semantic Web ecosystemNot just professionally-generated triples

Machines generate triples by parsing content and semantic inferencingRDA anticipates ...

User-generated tagsThe madness (or wisdom) of crowds

Other communities generate relevant triplesMemory institutions, publishers, reference services

Everybody uses triplesIn ways beyond our dreams ...

Page 13: Bibliographic data in the Semantic Web – what issues do we face in getting it there? Gordon Dunsire Presented to the ALCTS Cataloging and Classification

Thank you

[email protected]

ALACataloging & Classification QuarterlyMARCIVE, Inc.