establishing the connection: creating a linked data version of the bnb

24
Establishing the Connection: Creating a Linked Data Version of the BNB Neil Wilson Head of Metadata Services

Upload: nw13

Post on 10-Dec-2014

9.659 views

Category:

Education


1 download

DESCRIPTION

Presentation for Talis Linked Data in Libraries event July 14 2011Describes some of the choices made and lessons learned in migrating from traditional bibliographic metadata to linked open data.

TRANSCRIPT

Page 1: Establishing the Connection: Creating a Linked Data Version of the BNB

Establishing the Connection: Creating a Linked Data Version of the BNB

Neil Wilson

Head of Metadata Services

Page 2: Establishing the Connection: Creating a Linked Data Version of the BNB

Changing ExpectationsPublic Sector Metadata

The Web has accelerated development of a collaboration culture & fostered expectations that information & content should be as freely available as the Internet itself

Many wider benefit arguments have been advanced for public bodies to make their data freely available

2009 saw an increasing Government commitment to the principle of opening up public data for wider re-use.

The “Putting the Frontline First: Smarter Government” report required “the majority of government-published information to be reusable, linked data by June 2011”

Page 3: Establishing the Connection: Creating a Linked Data Version of the BNB

3

Developing an Open Metadata StrategyChoices and Challenges

When developing an open metadata strategy we wanted to:

Try and break away from library specific formats e.g. MARC and use more cross domain XML based standards e.g. DC, RDF etc

Develop the new formats with communities using the metadata

Get some form of attribution while also adopting a licensing model appropriate to the widest re-use of the metadata

Adopt a multi track approach addressing the needs of: Traditional libraries Researchers wanting to ‘data mine’ catalogues & new linked data developers & users

…And deliver the above with decreasing resources

Page 4: Establishing the Connection: Creating a Linked Data Version of the BNB

4

First Steps Toward An Open Metadata Strategy During 2010 We…

Developed a capability to supply metadata using RDF/XML standards used in the wider web community

Conducted trials with a range of new users including: the UK Intellectual Property Office & UNESCO

Developed a free Z39.50 MARC record download service for libraries to assist with derived cataloguing etc

Hosted a linked data workshop with 40 representatives from key international organisations

Page 5: Establishing the Connection: Creating a Linked Data Version of the BNB

5

Current Status Since August 2010 We Have:

Created a new email enquiry point for BL metadata issues: [email protected]

Signed up nearly 400 organisations worldwide to the free MARC21 Z39.50 service

Worked with JISC, Talis & other linked data implementers on technical challenges, standards & licensing issues

Begun to offer sets of RDF/XML metadata under a Creative Commons 0 (CC0) license

Supplied multi-million record sets to organisations including: the Open Bibliography Project, the Open Library & Wikimedia Commons

Page 6: Establishing the Connection: Creating a Linked Data Version of the BNB

6

Library Metadata & The Promise of Linked Data

Traditional library metadata uses a self contained, proprietary document based model

The Semantic Web uses a more dynamic data based model to establish relationships between data elements via links

By migrating from traditional models libraries could begin to:

Integrate their resources in the web, increasing visibility & reaching new users

Offer users a richer resource discovery experience

Transition from costly specialist technologies & suppliers & widen their choice of options

Traditional Library Metadata

Properties

‘Semantic’ Metadata

Properties

Proprietary, library specific standards

Passive

Self contained

Linear text -‘Read’ by users as result of database query

Offers end result

Open Standards

Dynamic/Reactive

Links to external resources

Micro Portal - Interacts with users & systems in response to queries Offers options for further inquiry

Page 7: Establishing the Connection: Creating a Linked Data Version of the BNB

7

Our Linked Data Journey…What to Offer?

Wanted to offer data allowing useful experimentation & advancing discussions from theory to practice

Why BNB?

General database of published output and not an institutional catalogue of unique items

Mass produced works on all subjects, many with internationally recognised identifiers e.g. ISBN

Reasonably uniform format across 60 years of publication

Significant amount of data – 3 million records in various languages

Page 8: Establishing the Connection: Creating a Linked Data Version of the BNB

8

Our Linked Data Journey…What do we need to get there?

Wanted to undertake the work as an extension of existing activities and as an opportunity to develop expertise using:

Existing staff – librarians rather than IT experts

As many pre-existing tools or technologies as possible

Standard PC hardware for conversion

Library MARC21 data as a starting point

Established linked data resources to connect to

A proven platform that would enable us to concentrate on the data issues

Page 9: Establishing the Connection: Creating a Linked Data Version of the BNB

9

Our Linked Data Journey…First stage: How To Migrate the Metadata?

From a flat catalogue card model to something more appropriate…

Preliminaries: Staff training in linked data

modelling concepts & increased familiarisation with RDF & XML concepts

Experience of working with: JISC Open Bibliography Project & Others

Feedback on initial MARC to XML conversion work

Incremental approach adopted Open Data License RDF/XML Format Add External Links Re-model Create Linked Data

Page 10: Establishing the Connection: Creating a Linked Data Version of the BNB

10

Our Linked Data Journey… Second stage: Selecting trusted resources to link to

To begin placing library data in a wider context & supplement or replace literal values in records

Looked for library sites: Dewey Info LCSH SKOS VIAF

Plus more general sites: GeoNames Lexvo RDF Book Mashup

Page 11: Establishing the Connection: Creating a Linked Data Version of the BNB

11

Our Linked Data Journey…Third Stage: Matching and Generating Links

Three main approaches used:

Automatic Generation of URIs from elements in records e.g. DDC

Matching of text in records with linked data dumps e.g. personal names to VIAF & subjects to LCSH to identify URIs

Two stage crosswalk/matching process for some coded information e.g. MARC country & language codes for GeoNames

Page 12: Establishing the Connection: Creating a Linked Data Version of the BNB

12

Our Linked Data Journey…MARC to RDF Conversion Workflow

1) SelectionIn-house utilities / MARC ReportExclusions (CIP; multiparts; serials)

2) Pre-processingMARC GlobalNormalise data values, Remove trailing punctuationMove/copy data values to improve machine matching/transformation

3) Character set conversion In-house utilities

Decomposed UTF-8 converted to precomposed for conformancewith W3C recommendations

4) URI creation In-house utilities Create BL URIs in MARC fields) Harvest URIs from external sources

5) Data Transformation MARC Report & MARC 21/RDF XSLT Convert to RDF & Insert URI prefixes

MARC to RDF Conversion Consists of multipleautomated steps using a range of tools

Page 13: Establishing the Connection: Creating a Linked Data Version of the BNB

13

Full BNB MARC21

File

Transform to RDFXML using

XSLT

Load to Linked Data Platform

Generate RDF Triple Dump

BNB RDF/XML file

Select single volume

published books only

Normalise for improved

matching & transforms

Convert to pre-composed UTF-8

Create BL URIs and add external

URIs by matching

MARCPre-Processing

Our Linked Data Journey…MARC to RDF Conversion Workflow

Page 14: Establishing the Connection: Creating a Linked Data Version of the BNB

14

Our Linked Data Journey…Which took us from here...

Page 15: Establishing the Connection: Creating a Linked Data Version of the BNB

15

Our Linked Data Journey…Via here...

Page 16: Establishing the Connection: Creating a Linked Data Version of the BNB

16

Our Linked Data Journey…To here...

Page 17: Establishing the Connection: Creating a Linked Data Version of the BNB

17

bnb.data.bl.ukPreview Options

bnb.data.bl.uk/sparql bnb.data.bl.uk/describe bnb.data.bl.uk/search

.

Includes: BNB Books 2005-11 485,000 records 18,000,000 RDF Triples

Page 18: Establishing the Connection: Creating a Linked Data Version of the BNB

18

bnb.data.bl.ukSample ‘Labelled Concise Bound Description’

Page 19: Establishing the Connection: Creating a Linked Data Version of the BNB

19

Our Linked Data Journey…Journey’s End…Point?

Preview Details at:

http://www.bl.uk/bibliographic/datafree.html

Roadmap for next steps includes: Staged release over coming

months for: books, serials, multi-parts etc

Aiming to update on a monthly basis once complete

Documentation & further refinement of data model

Looking at RDF triple dump option

What else might be offered?

Page 20: Establishing the Connection: Creating a Linked Data Version of the BNB

20

Lessons Learned on the Journey General

It is a new way of thinking

Legacy data wasn’t designed for this purpose so starting can be problematic

There are many opinions…but few real certainties Everyone is learning & multiple solutions exist so you may be the best judge

Don’t reinvent the wheel...there are often tools or experience you can use. Start simple & develop in line with evolving staff expertise

Give careful thought to data modelling & sustainability issues e.g.

Where possible use cross domain standards e.g. ISO codes in data

Select relevant & stable targets when providing links if you are doing so

Page 21: Establishing the Connection: Creating a Linked Data Version of the BNB

21

Lessons Learned on the JourneyData Issues

Reality check by offering samples for feedback to wider groups

Be prepared for some technical criticism in addition to positive feedback & try to continually improve in response

Conversion inevitably identifies hidden data issues…& creates new ones!

…But it’s often better to release an imperfect something than a perfect nothing!

Page 22: Establishing the Connection: Creating a Linked Data Version of the BNB

22

Lessons Learned Along The WayStaff and Resource Issues

It can be a steep learning curve so:

Look for training opportunities to develop staff skills to support new open metadata standards

Cultivate a culture of enquiry & innovation among staff to widen perspectives on new possibilities

Look into collaborative pilot projects with peer organisations to share resources & expertise

See what tools are already out there that can save you development time or assist in checking data

Page 23: Establishing the Connection: Creating a Linked Data Version of the BNB

2323

Final Thoughts…For Others Contemplating a Similar Journey

It’s never going to be perfect first time

We expect to make mistakes

We aim to learn from them

We hope others will learn something too

… and that everyone benefits from the experience

So if anyone is thinking of undertaking a similar journey…..

Just do it!

Page 24: Establishing the Connection: Creating a Linked Data Version of the BNB

24

Any Questions…?

bnb.data.bl.uk/sparqlbnb.data.bl.uk/describebnb.data.bl.uk/search

Images from