dublin core and metadata: a tutorial lorcan dempsey andy powell ukoln, university of bath (with a...

72
Dublin Core and metadata: a tutorial Lorcan Dempsey Andy Powell UKOLN, University of Bath (with a little help from our friends) http://www.ukoln.ac.uk/metadata

Upload: shannon-watkins

Post on 27-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Dublin Core and metadata:a tutorial

Lorcan Dempsey

Andy Powell

UKOLN, University of Bath

(with a little help from our friends)

http://www.ukoln.ac.uk/metadata

2 - Lux, 1-2 Dec 1997

Questions for you ...

• Metadata• EAD, CIMI, TEI • PICS, XML, RDF• MARC• 856• Dublin Core• you are

• geeks/people with sensible shoes• goers/doers

3 - Lux, 1-2 Dec 1997

Overview

• UKOLN and metadata

• Metadata landscape

• Dublin Core

• Metadata management

• Interoperability

• Harvesting

• Future

4 - Lux, 1-2 Dec 1997

UKOLN and metadata

• ROADS• subject gateways• WHOIS++ templates

• BIBLINK • CIP for electronic data• Dublin Core (+

MARC)

• Desire• WHOIS++, GILS,

Dublin Core• Z39.50/WHOIS++

• NewsAgent • current awareness,

Ariadne

• Dublin Core, DC-dot • MODELS

• collection description??

• Agora• PRIDE• Initiatives

Metadata landscape

6 - Lux, 1-2 Dec 1997

What is metadata …?

• It’s just cataloguing, isn’t it … ?

• Yes and no …

• Data which supports operations carried out on information objects …

– discover, buy, ...

• In the company of strangers (Brody)• Relieve user of having to have full advance

knowledge of characteristics of resources …

… variety

7 - Lux, 1-2 Dec 1997

Semantics, syntax, content

MARC, ISO 2709, AACR2

Libraries

MARC AACR2

Metadata model: the library example

Picture by Stu Weibel

8 - Lux, 1-2 Dec 1997

Variety of formal and informal metadata models

Museums

GeospatialLibraries

InternetCommons

Commerce

Whatever...

ScientificData

HomePages

Picture by Stu Weibel

9 - Lux, 1-2 Dec 1997

Variety of operations ...

• Discovery• Location• Selection

• fit for use

• Acquire• terms

• Manipulate• Exploit

• IPR

• Document• Contextualise• Preserve • Manage

• dates, people, structures, …

• Agent/client access

• ….

10 - Lux, 1-2 Dec 1997

Variety of sectors ...

• Curatorial traditions• ‘cataloguing’/documentation• libraries, archives, text archives, museums,

geospatial data, etc

• Network resource discovery • directory services, search engines, etc• influence from computer science

• Network information management• web developments, W3C, database• sitemap, time to live, ...• pragmatic - market needs, vendor push

11 - Lux, 1-2 Dec 1997

Variety of creation models ...

• Author/creator• web pages?

• Repository/site manager• effective disclosure• better management

• Third party creator• e.g. eLib subject gateways• Library

12 - Lux, 1-2 Dec 1997

Metadata ...

• Variety of metadata models • syntax, semantics, content

• scope

• sectors/domains

• Variety of operations supported

• Variety of creation models

• Variety of architectures for disclosure/discovery• Search and retrieve

• Disclosure/distribution

• Management

… complex

13 - Lux, 1-2 Dec 1997

Band One

(full text indexes)

Band Two

(simple structuredgeneric formats)

(syntax/semantics?)

Band Three

(more complex structure,domain specific)

(part of alargersemanticframework)

Proprietary formats Proprietary formats

FGDC TEIheaders

Dublin Core ‘MARC’ ICPSR

IAFA/WHOIS++templates

GILS EAD

RFC 1807 … CIMI

… … …

Some formats

richer… semantics, structure, domain-specific, ...

Dublin core in the metadata landscape

15 - Lux, 1-2 Dec 1997

Dublin Core• Metadata model

• Simple element set • focus on semantics - several

target syntaxes

• Operations• resource

discovery on the web

• Explicitly cross sector/domain• No constraint on creation model

or application architecture

FG

DC

MA

RC

Museum

...

Dublin Core

… simple and intuitive

16 - Lux, 1-2 Dec 1997

Dublin core - why success?

• Simple

• Coincides with strategic needs in each of sectors we identified

– Curatorial: semantic interoperability between richer metadata models

– Resource discovery: a simple format for descriptive metadata (DLOs)

– Web management: associate metadata with Web resources

• Inclusive (countries/domains/traditions)• Stu Weibel

Introduction to Dublin Core

18 - Lux, 1-2 Dec 1997

Dublin Core - elements

• Title • Subject • Description • Creator • Publisher • Contributor • Date • Type

• Format • Identifier • Source • Language • Relation• Coverage • Rights

• 15 element core metadata set

19 - Lux, 1-2 Dec 1997

Dublin Core - HTML Example<HTML><HEAD>

<TITLE>UKOLN Home Page</TITLE>

<META NAME="DC.Title” CONTENT="UKOLN: UK Office for Library and Information Networking">

<META NAME="DC.Subject" CONTENT="national centre, network information support, library community, awareness, research, information services, public library networking, bibliographic management, distributed library systems, metadata, resource discovery, conferences, lectures, workshops">

<META NAME="DC.Description" CONTENT="UKOLN is a national centre for support in network information management in the library and information communities. It provides awareness, research and information services">

<META NAME="DC.Creator" CONTENT=”Isobel Stark">

</HEAD>

...

Management

21 - Lux, 1-2 Dec 1997

Data creation

Practical issues of using Dublin Core for Internet resource description...

• UKOLN metadata system• Requirements• 3 models for metadata management• Implementation at UKOLN

22 - Lux, 1-2 Dec 1997

UKOLN metadata system requirements

• Easy to use

• Work with a variety of methods of creating HTML

• Simple migration to future metadata formats

• Separate metadata from resource

23 - Lux, 1-2 Dec 1997

Managing Dublin Core (1)HTML Authoring tool

Pros…• Simple• May be useful for

training and familiarisation

Cons…• May not be possible

with all editors• Maintenance

problems• Easy to make errors

Embed by hand using HTML or text editor

24 - Lux, 1-2 Dec 1997

DC-dot

• A Web based tool for creating Dublin Core <meta> tags

• Automatic generation of some tags based on content of the resource

• Forms based editing of tags

• Cut-and-paste output into HTML

• Conversion to other formats…• SOIF, ROADS/WHOIS++, USMARC,

GILS...

http://www.ukoln.ac.uk/metadata/dcdot/Run

demo

25 - Lux, 1-2 Dec 1997

Managing Dublin Core (2)Web-site management tool

Pros…• Use of Web-site

management tools likely to increase

• Object-oriented database approach

Cons…• Proprietry formats• Early days - too

early to evaluate use for metadata yet?

Use Web-site management tool,for example NetObjects Fusion

26 - Lux, 1-2 Dec 1997

Managing Dublin Core (3)On the fly generation

Pros…• Separates

metadata from resource

• Future migration fairly simple

Cons…• Performance• Lack of integration

with HTML tools• Server specific

Hold Dublin Core separately and embedon-the-fly using server-side include (SSI)

27 - Lux, 1-2 Dec 1997

UKOLN metadata system (1)

• Embed on-the-fly

• Apache SSI script

• Store metadata using SOIF records

• Use MS-Access as tool to create the records

• Associate metadata with resource by co-locating them in the Web server filestore

28 - Lux, 1-2 Dec 1997

UKOLN metadata system (2)

MS-AccessDatabase

HTMLeditor

<html><head><title>…</title><!--#exec cmd="getmeta" --></head>...

<html><head><title>…</title><!--#exec cmd="getmeta" --></head>...

intro.html

@FILE { http://www.ukoln.ac....keywords{13}: xxx, yyy, zzzdescription{14}: blah blah bauthor{13}: Stark, Isobel...}

@FILE { http://www.ukoln.ac....keywords{13}: xxx, yyy, zzzdescription{14}: blah blah bauthor{13}: Stark, Isobel...}

intro.html.soif

Apache syntax for calling server-side script<!--#exec cmd="getmeta" -->

29 - Lux, 1-2 Dec 1997

UKOLN metadata system (3)

MS-Access frontend...

Filename browser

Text boxes

Name choosers

UKOLNspecificmetadata

30 - Lux, 1-2 Dec 1997

UKOLN metadata system (4)

UKOLNWeb server

<html><head><title>…</title><!--#exec cmd="getmeta" --></head>...

<html><head><title>…</title><!--#exec cmd="getmeta" --></head>...

intro.html

intro.html.soif

SSIscript

2

3

45

6

1

@FILE { http://www.ukoln.ac....keywords{13}: xxx, yyy, zzzdescription{14}: blah blah bauthor{13}: Stark, Isobel...}

@FILE { http://www.ukoln.ac....keywords{13}: xxx, yyy, zzzdescription{14}: blah blah bauthor{13}: Stark, Isobel...}

Webrobot

31 - Lux, 1-2 Dec 1997

Issues

• Performance

• Interaction with Web caches

• Dublin Core vs Alta Vista style metadata<META NAME=”Description” CONTENT=”blah, blah"><META NAME="Keywords” CONTENT="xxx, yyy, zzz">

• Granularity• Which pages should have metadata?

A short history:Dublin to Helsinki

We have borrowed some of this material from Stu Weibel, with

permission

33 - Lux, 1-2 Dec 1997

Dublin Core Workshop Series ..

• DC-1: OCLC/NCSA Metadata Workshop Mar, 1995

• Limited Scope: Discovery of document-like objects• 13 element Dublin Core• Interdisciplinary consensus

• DC-2: OCLC/UKOLN Warwick Workshop April, 1996

• Warwick Framework - modularity• Syntax issues

34 - Lux, 1-2 Dec 1997

.. Dublin Core Workshop Series

• DC-3: CNI/OCLC Image Metadata Workshop, Sep, 1996• Images are in scope• 15 element core; some element name

changes

• DC-4: Canberra Metadata Workshop Mar, 1997

• Minimalists and Structuralists

• Canberra Qualifiers (additional information useful for interpretation of metadata)

35 - Lux, 1-2 Dec 1997

Dublin core - qualifiers

• Language of element value• Scheme

• specifies a context for interpretation

<META NAME=“DC.Subject” SCHEME=“ddc.21” CONTENT=“170.42”>

• Sub-element• specifies a facet - narrows

<META NAME="DC.Creator.Address"

CONTENT=“[email protected]">

36 - Lux, 1-2 Dec 1997

DC-5

• DC-5: National Library of Finland/OCLC Workshop, October 1997

– Formal Data Model (expressed in RDF)– many other problems are hereby made simpler– Resource Description Framework– The return of modularity

– Finnish finish (of unqualified DC)– minimalist DC is done and will not be changed

– Semantics for additional sub-structure– a small number of sub-elements will be established

– Closer DC-W3C collaboration

37 - Lux, 1-2 Dec 1997

Working groups

• Data Model• date, relationship,

source• what is a resource?• 1:1• RDF

• Relationships• Typology

• Sub-elements

• Date

38 - Lux, 1-2 Dec 1997

RFCs in preparation

• Simple DC semantics (the minimalist position)

• Simple DC syntax for embedded HTML • DC semantics with qualifiers• DC syntax with qualifiers

• HTML 2.0• HTML 4.0• RDF

Dublin Core implementation

40 - Lux, 1-2 Dec 1997

Projects

• 30 projects; 10 countrieshttp://purl.org/metadata/dublin_core/projects.html

• “Interdisciplinary and international recognition as the lingua franca for resource discovery metadata for electronic resources” Stu Weibel

• Support for use for non-digital objects

41 - Lux, 1-2 Dec 1997

The HTML 2.0 “kludge”• Convention for simple embedded

metadata• Bootstrapping early Dublin Core

deployments • META tags and standard HTML syntax

• Useful for simple metadata without qualifiers• Can support Dublin Core qualifiers, but with

risks for interoperability and indexing purity

<META NAME="DC.Subject" CONTENT="(SCHEME=LCSH)Information technology -- higher education">

42 - Lux, 1-2 Dec 1997

HTML 4.0 - DC influences the web

• Richer <META> tag attributes• LANG (language of the metadata)• SCHEME (formal qualifier)• SUB-ELEMENTS (dot syntax extensions)

• Allows syntactically “clean” implementation of metadata with qualifiers

<META NAME="DC.Subject" SCHEME="LCSH" CONTENT="Information technology -- highereducation">

43 - Lux, 1-2 Dec 1997

Some quick statistics

• UK (academic sites only)• Total pages: ~1.5M (a guess!)• Embedded DC: ‘a few hundred’http://www.cs.ukc.ac.uk/people/staff/djb1/

• Sweden• Total pages: 1.4M• Embedded DC: ‘a few dozen’http://www.lub.lu.se/nwiPaper/

Informationprovided by

DaveBeckett

Informationprovided by

SigfridLundburg

Interoperability

45 - Lux, 1-2 Dec 1997

Interoperability

• What do we mean by interoperability?

• Issues

• Z39.50 and Dublin Core

• Metadata registries

46 - Lux, 1-2 Dec 1997

Interoperability?

• Unify access to data in different domains - Web, library, museums, archives, ...

• Issues• Protocols - Z39.50, WHOIS++, …

– gateways

• Attribute names - author/creator/...– Semantic interoperability - mapping tables

• Format of results– format converters

In real lifethese can allget mixed up

47 - Lux, 1-2 Dec 1997

Protocol Gateways - an example

• ZEXI - a Z39.50 to WHOIS++ gateway

• Based on CNIDR's Isite

• Accepts Z39.50 searches

• Converts them to WHOIS++

• Returns SUTRS records

http://roads.ukoln.ac.uk/cgi-bin/egwcgi/egwirtcl/targets.egw

48 - Lux, 1-2 Dec 1997

Attribute names

• Different databases may use different ‘names’ for the same thing• ‘creator’ vs ‘author’

• Need to be able to construct searches that ‘work’ against different databases irrespective of the ‘names’ in use

• Dublin Core provides a minimal set of agreed ‘names’ with which we can construct searches

49 - Lux, 1-2 Dec 1997

Format of results

• Different databases may return results in different formats• USMARC, GRS-1, SUTRS, IAFA, ...

• Early stages of searching ideally need results to be returned in single ‘simple’ format

• Dublin Core provides a minimal set of agreed data elements with which we can construct results

50 - Lux, 1-2 Dec 1997

Z39.50 and DC - searching

• Version 2• Searches phrased in terms of single attribute

set only• Either need to

– add DC attributes to Bib-1– map DC to Bib-1

• Version 3• Multiple attribute sets allowed for searching• New simple DC attribute set to be proposed• Other attributes taken from Bib-1

http://cypress.dev.oclc.org:12345/~rrl/docs/dublincoreandz3950.html

51 - Lux, 1-2 Dec 1997

Z39.50 and DC - retrieval

• To return Dublin Core ‘records’ using Z39.50…• use GRS-1 (General Record Syntax)• elements are assigned tags• DC elements have been added to

tagset-G

52 - Lux, 1-2 Dec 1997

Format conversion - issues

• Simple to rich, e.g. DC to MARC• May not generate valid rich record

without manual enhancement• Use of DC qualifiers required for

decent MARC record

• Rich to simple, e.g. MARC to DC• Loss of data

53 - Lux, 1-2 Dec 1997

Metadata registries

• Semantics• Agreement on element meanings• Agreement on enumerated lists

• Qualifiers• Thesaurus naming

• Publishing existing metadata sets• Re-use by others - prevent duplication

of work• e.g. Administrative metadata

54 - Lux, 1-2 Dec 1997

Some pointers

• Mapping tableshttp://www.ukoln.ac.uk/metadata/interoperability/

• Software• Generalhttp://www.ukoln.ac.uk/metadata/software-tools/

• d2m : Dublin Core to MARC converterhttp://www.bibsys.no/meta/d2m/

• USEMARCONhttp://www2.echo.lu/libraries/en/projects/

usemarc.html

Harvesting

56 - Lux, 1-2 Dec 1997

Harvesting Dublin Core

• General Issues

• Building a Web index• Harvest and NWI

• Building a ‘local’ search engine• Harvest, SWISH-E, Isite, Zebra

• DC as cataloguer’s aid

57 - Lux, 1-2 Dec 1997

Harvesting - issues

• Mappings

• Multiple element values

• Multiple languages

• Complex data values• e.g. DC.Date, DC.Coverage

• SCHEMES

58 - Lux, 1-2 Dec 1997

Harvesting - issues

• Frames

• Harvesting non-embedded metadata

• HTML 3.2 vs HTML 4.0

• Hidden pages

• Controlling the robot

59 - Lux, 1-2 Dec 1997

Harvest

• Resource discovery suite of tools - robot, summarisers, indexers

• SOIF records

• Supports a variety of indexers

• Supports database brokerage model

• CGI based user-interface

• UKOLN’s HTML summariser is Dublin Core aware

http://www.tardis.ed.ac.uk/harvest/

60 - Lux, 1-2 Dec 1997

Nordic Web Index

• Custom robot - NWI/Combine

• Dublin Core aware

• GILS-II records

• Indexed using Zebra

• Searched using Z39.50

• User interface based on Europagate

http://nwi.ub2.lu.se/?lang=uk

61 - Lux, 1-2 Dec 1997

Other software

• SWISH-E• system for indexing local collections of

Web pages or other text fileshttp://sunsite.berkeley.edu/SWISH-E/

• Isite• text indexer (Isearch) and Z39.50http://www.cnidr.org/ir/isite.html

• Zebra• text indexer and Z39.50

http://www.indexdata.dk

62 - Lux, 1-2 Dec 1997

DC as cataloguer’s aid

• ROADS• Software to create, manage and

search Internet resource descriptions• WHOIS++• Records created manually• Pump-prime’ metadata record with

values based on embedded DC using robot

http://www.ukoln.ac.uk/roads/

63 - Lux, 1-2 Dec 1997

DC as cataloguer’s aid

• BIBLINK• Flow of information from publishers to

National Bibliographic Agencies• MARC based catalogues of electronic

publications• Initial MARC record based on DC

description supplied by publisher using email

http://www.ukoln.ac.uk/metadata/BIBLINK/

Dublin Core - critique

70 - Lux, 1-2 Dec 1997

Limits

• In development

• Syntax

• Simple• Discovery• Document like objects• Weak model• Administrative metadata

• Addressed in Helsinki

Futures

The material on RDF has been adapted from Stu Weibel’s material, with permission

72 - Lux, 1-2 Dec 1997

Dublin Core futures

• Internal• Syntax and semantics

• External environment

73 - Lux, 1-2 Dec 1997

Syntax• HTML 2, HTML 4, RDF, ...• RDF - W3C (World Wide Web Consortium)

initiative• “RDF is the realization of the Warwick Framework

for the Web”• RDF will be the foundation for an architecture for

metadata on the WebResource description Electronic commerce

Site mapping Third party rating

Digital signatures

74 - Lux, 1-2 Dec 1997

RDF: Why is it important?

• RDF provides a coherent data model and syntactical framework for ‘plug-n-play’ metadata• the semantics and structure of metadata packages will

be determined by stakeholder communities via independently developed and maintained metadata element sets

• e.g.: MARC, DC, TEI, GILS, CIMI, Ratings….

• Political imperatives for deployment• Software infrastructure will be ubiquitous (and

come for free in browsers and servers)

75 - Lux, 1-2 Dec 1997

Semantics

• Tension• simple vs complex• generic vs specific• interoperability vs selfstanding

• Development• relationship• sub-elements• scheme

76 - Lux, 1-2 Dec 1997

Environment

• ‘Save the time of the user’

• Diverse resources• Broker/middleware/

gateway/trading place/…

• Variety of protocols and metadata models

• DC• simple - volume• ‘shallow’ - interop

77 - Lux, 1-2 Dec 1997

Further Information

• Dublin Core Home Page http://purl.org/metadata/dublin_core

• W3 Metadata Overview and RDFWorking Group Home Pagehttp://www.w3.org/Metadata/RDF

• UKOLN metadata pagehttp://www.ukoln.ac.uk/metadata/