a digital library repository utilizing the open archives initiative developed to meet the needs of...
TRANSCRIPT
A Digital Library RepositoryUtilizing the
Open Archives Initiative
Developed to meet the needs of UTK Library Special Collections
Tremendous quantities of valuable information exist in Museums, Libraries, and Research Centers
which are not available in a standardized format via centralized search engines
Photos and videosScientific records Mathematical findings
Musical scores and sound tracks
Historical Documents
Theses and Dissertations
The Problem:
How to make the connection???
Translation of records:
Into a Common Format and Language:
XML & Unqualified Dublin Core
Storage: of these translations
Response: to a standardized set of queries
The Open Archives Solution:
Gather document descriptions from Repositories into large databases, using OAI Harvesters Set up search engines to offer up information in these databases
Photos and videosScientific records Mathematical findings
Musical scores and sound tracks Historical
DocumentsTheses and Dissertations
Required For Translation:
Understanding of XML and XML schemas
Determining correct mapping of information to Unqualified Dublin Core Elements, in order to translate legacy files into a metadata format supported by the Open Archives Initiative
Scripts to reduce the labor of translation
The 15 elements of Dublin Core Unqualified:
A Common Language…. Dublin Core
Content: Title Description Coverage
Relation Source Subject Type
Intellectual Property: Contributor Creator Publisher Rights
Instantiation: Date Format Identifier Language
<complexType name="dublincoreType">
<choice minOccurs="0" maxOccurs="unbounded">
<element name="subject" minOccurs="0" maxOccurs="unbounded" type="string"/>
</choice> </complexType> </schema>
A Common Framework: XML schemas
The XML schema
constrains each element of the document,
providing rules and framework for parsing:
<PROFILEDESC><TEXTCLASS><KEYWORDS>SCHEME="LCSH"><LIST><ITEM>Letters</ITEM><ITEM>Cherokee Indians—Claims against</ITEM><ITEM>Tennessee</ITEM></LIST></KEYWORDS></TEXTCLASS></PROFILEDESC></TEIHEADER>
From a TEI Lite SGML file segment:
To an Unqualified Dublin Core XML file segment:
<subject> Letters</subject> <subject>Cherokee Indians Claims against</subject> <subject>Tennessee</subject>
A Common Format…. XML
<TEIHEADER> <FILEDESC> <TITLESTMT><TITLE>[Letter] July 8, 1839, Washington City DC, [to] HP King, Qualla Town / William
Holland Thomas: a machine-readable transcription of an image</TITLE>… <AUTHOR>Thomas, William Holland</AUTHOR> … <PUBLISHER>The University of Tennessee Libraries</PUBLISHER> <IDNO>wt025</IDNO> …<AVAILABILITY><P>This work is the property of the Special Collections Library, University of Tennessee, Knoxville, TN. It may be used freely by individuals for
research, teaching, and personal use as long as this statement of availability is included in the text.</P></AVAILABILITY></PUBLICATIONSTMT> <SOURCEDESC><BIBL>…
<DATE VALUE="1839-07-08">July 8, 1839</DATE>…<NOTE TYPE="summary">This document is a letter dated July 8, 1839 to H.P. King
from William Holland Thomas with instructions for running the Indian Store.</NOTE> … <PROFILEDESC> <TEXTCLASS> KEYWORDS SCHEME="LCSH"><LIST> <ITEM>Cherokee Indians</ITEM> <ITEM>Government relations</ITEM> </LIST> /KEYWORDS></TEXTCLASS></PROFILEDESC>… <TEXT><BODY><DIV1 TYPE="letter">…
Selected Portions of a TEI-Lite SGML record
… Translated to XML Unqualified Dublin Core
<title>[Letter] July 8, 1839, Washington City DC, [to] HP King, QuallaTown</title> <contributor>The University of Tennessee Libraries, Knoxville</contributor> <contributor>Southeastern Native American Documents Collection (GALILEO (Georgia statewide project)) GAGAL</contributor> <creator>Thomas, William Holland</creator> <publisher>The University of Tennessee Libraries</publisher> <date>July 8, 1839</date> <description> This document is a letter dated July 8, 1839 toH.P. King from William Holland
Thomas with instructions for running the Indian Store.</description> <identifier>Document ID: wt025</description> <identifier>http://www.helios.dii.utk.edu/oai/sgm/00178.html <subject>Cherokee Indians</subject> <subject>Government relations</subject> <rights> This work is the property of the Special Collections Library, University of Tennessee, Knoxville, TN. It may be used freely by individuals for research, teaching, and personal use as long as this statement of availability is included in the text. </rights> <type>letter</type> <type>computer file</type>
Crosswalks available:
MARC to DC: http://www.loc.gov/marc/dccross.html
Shown in action at: http://alcme.oclc.org/marc2dc/index.html
OTHERS:
http://www.sinica.edu.tw/~metadata/tool/mapping-foreign.html
http://www.lub.lu.se/tk/metadata/MDin9612.html
http://www.getty.edu/research/institute/standards/intrometadata/3_crosswalks/index.html
Translation Tools:
The Open Archives Solution:
Gather document descriptions from Repositories into large databases, using OAI Harvesters Set up search engines to offer up information in these databases
Translation of records:
Into a Common Format and Language:
XML & Unqualified Dublin Core
Storage: of these translations
Response: to a standardized set of queries
Storage of OAI Records
mysql> create table gsm( -> id char(10) not null, -> primary key (id), -> date char(10), -> path char (80), -> listit text);
$sth = $dbh->prepare("select listit from $set where date <= '$until' and date >= '$from' order by id");
MySQL: small, fast, and free: http://www.mysql.com
Use scripts to load database and retrieve information
Store entire records, already marked up in Unqualified Dublin Core, for quick response; …or
Store fields untagged, multiple values for a field separated by tags, and retag upon request: flexibility. This structure allows for a record to be entered once and retrieved in various formats upon request.
For local search engines, also store hardcoded xml files in a directory.
The Open Archives Solution:
Gather document descriptions from Repositories into large databases, using OAI Harvesters Set up search engines to offer up information in these databases
Translation of records:
Into a Common Format and Language:
XML & Unqualified Dublin Core
Storage: of these translations
Response: to a standardized set of queries
Response:
Offer up document descriptions via a standardized set of queries & responses:
the Open Archives Initiative Protocol1) 6 Verbs, with 5 required and/or optional arguments
2) Unique Identifiers, Optional Sets, and Metadata Prefixes
3) Flow control & Resumption Tokens
4) Error Codes
Verbs and arguments: The Open Archives Protocol
1) Identify
2) ListSets
3) ListMetadataFormats: optional: identifier
4) ListIdentifiers: required: metadata prefix (oai_dc); optional: from, until, set, resumption token
5) ListRecords: required: metadata prefix (oai_dc); optional: from, until, set, resumption token
6) GetRecord: required: identifier and metadata prefix
Identifiers, Sets, and Metadata Prefixes
oai:tkn:har/har0001oai:tkn:che/che0003oai:tkn:civ/civ0001oai:tkn:etd/etd0002oai:tkn:emn/emn0001oai:tkn:ead/ead0003oai:tkn:gsm/gsm0045oai:tkn:ldr/ldr0002oai:tkn:rth/rth0034oai:tkn:tdh/tdh0005oai:tkn:vid/vid0001
harche civ etd
emn ead gsm ldr rth tdh vid
Bessie Harvey CollectionCherokee Civil War CollectionElectronic Theses and Dissertations Emancipator Encoded Archival Description Great Smoky Mountains Library Development Review Roth Photography Collection Tennessee Documentary HistoryVideos
Sample Identifiers:Input as "Set":
Current Sets:
Supported Metadata prefix: oai_dc
Flow Control and ResumptionTokens
For ListIdentifiers, ListSets and ListRecords
<resumptionToken>
LRrtdc20f19990202u20020101
</resumptionToken>
LR or LI for ListRecord or ListIdentifier
rt: Number or letter combination: which set next
dc: Metadata format
20: Which record number to start with this time
f19990202 = From date 1999-02-02
U20020101 = Until date 2002-01-01
Specifies the call to the database when thisResumption token is returned!!
badResumptionToken
badVerb
badArgument
idDoesNotExist
cannotDisseminateFormat noMetadataFormats noRecordsMatch
noSetHierarchy
Error Codes: version 2.0
OAI 1.1 Test interface and Local Search Engine: http://oai.sunsite.utk.edu/1.1.html
Search by:
word or phrase
Searching by all or any field and set,
Sorting by date or set
Returning:
Lists of identifiers or short file descriptions,
each with links to full file in HTML, XML, and online document
Scientific records Mathematical findings
Musical scores and sound tracks
Historical Documents
Theses and Dissertations
Videos and Photos
The Open Archives Solution:
Gather document descriptions from Repositories into large databases, using OAI Harvesters Set up search engines to offer up information in these databases
Translation of records:
Into a Common Format and Language:
XML & Unqualified Dublin Core
Storage: of these translations
Response: to a standardized set of queries
CrossWalks:http://www.sinica.edu.tw/~metadata/tool/mapping-foreign.html
http://www.lub.lu.se/tk/metadata/MDin9612.htmlhttp://www.getty.edu/research/institute/standards/
intrometadata/3_crosswalks/index.html
More Information: www.openarchives.org
Pre-developed repositories, harvesters, search engines, and more: http://www.openarchives.org/tools/tools.html
Current Service Providers, who can offer searches of your records from your repository responses;
http://www.openarchives.org/service/listproviders.html