oclc online computer library center contentdm interoperability -- leveraging resources; repurposing...
TRANSCRIPT
OCLC Online Computer Library Center
CONTENTdm Interoperability--
Leveraging resources; repurposing collections
ALA AnnualNew Orleans, LAJune 23rd, Friday, 9 am to noon
Claire Cocco, Product Manager
Geri Ingram, Customer Service Specialist
DiMeMa, Inc.
OCLC Online Computer Library Center
Agenda Part 1
9:00 to 10:15
I. Mainstream digital objects into existing workflows
Importing from legacy systems
II. Exporting
III. Example of collaborative development for interoperability
METS transform (courtesy of CDL)
[BREAK 10:15 TO 10:30]
OCLC Online Computer Library Center
Agenda Part 2
10:30 to 11:30
Customizing and integrating your CONTENTdm site
Web templates
Custom Queries and Results
Configuration files
OCLC Online Computer Library Center
Agenda Part 3
11:30 to Noon
Handling Finding Aids
Importing EAD files into CONTENTdm
OCLC Online Computer Library Center
Setting the context: fully engaged in digital library transformation
Library services and collections expanding to encompass all
Traditional to digital
Licensed
Reformatted
Sharing
Preserving
OCLC Online Computer Library Center
Leveraging resources
Staff time and skills throughout the organization
and/or consortium
Existing metadata in some form
Existing digital collections (images and
transcripts)
OCLC Online Computer Library Center
Why? For better customer service
In order to mainstream your processing and amplify your efforts.
Your digital collections should ultimately be mainstreamed into regular workflows, similar to the ones used for other materials (whether that’s done centrally or in a distributed fashion).
This includes selection, technical processing (cataloging, organizing, importing), integration with site vis-à-vis presentation and archiving.
OCLC Online Computer Library Center
Mainstreaming processing of digital formats(Part 1 of 3)
I. Importing from other systems to CONTENTdm
II. Exporting from CONTENTdm
III. Example of collaborative development for
interoperability
A. CONTENTdm Standard Export
B. METS transform for import
OCLC Online Computer Library Center
I. Importing from other systems to CONTENTdm
• Metadata only
• When records describe items that are not yet scanned
• Replace “null” files at later time
• Metadata AND their digital files
OCLC Online Computer Library Center
From an OPAC or other database system
When you have…
Individual image files cataloged already
And can export from an OPAC or other dbms
Or where you have compound digital objects ready
for migration
OCLC Online Computer Library Center
Migration steps:
Prepare the collection and the import files
Cross-walk metadata to Dublin Core
Configure the CONTENTdm collection fields
Export and prep data in a tab-delimited ASCII file
Import the file to CONTENTdm
OCLC Online Computer Library Center
Data prep: Common problems in tab delimited data files Extra data in columns or rows
Extra tabs at end of line
Extra CRs at end of file (Should only be 1 CR)
Carriage return in metadata, tab in metadata
Files must exist
0 versus O
Error may occur in previous record, check few rows before and after error
File names are required, not full pathnames
OCLC Online Computer Library Center
Data prep: Troubleshooting with Excel
Use Microsoft Excel to open the file and view data
Each row should be an item with last column as filename
Work with small batches to find errors – keep adding items until record with error is found
Use Excel’s “CLEAN” function to remove invisible characters
Import images from directory without using tab delimited file
Checks for any type of imaging errors
OCLC Online Computer Library Center
Demo: MARC to DC
Export MARC records to tab-delimited text file (using ILS or MarcEdit)
Format and clean up the text file to conform to your CONTENTdm Collection schema
Import the file (with or without images) to the Collection
OCLC Online Computer Library Center
Importing compound objects
• For documents, postcards, monographs and
picture cubes
• Can do singly or in batch
• Much easier to start with singles, then set up for
batch when process is smooth
OCLC Online Computer Library Center
Migrate compound objects from another database system
Where you have many compound digital objects to migrate
Prepare the collection and the import files
Cross-walk metadata to Dublin Core
Configure the CONTENTdm collection fields
Configure folders for scans and transcripts (if appropriate)
Choose an import method based on your data structure
Create tab-delimited ASCII file(s) appropriate to the method
Import the files to CONTENTdm in batches
OCLC Online Computer Library Center
Multiple compound object wizard
Documented in online tutorial
Today’s demo described in handout
Four import methods for multiple object loading
Compound object (same as single, but upload batched)
Directory Structure (most flexible and efficient)
Object List (useful when NO page-level metadata)
Job List
Time allowing, demonstrate three different object types using 3 of 4 methods
OCLC Online Computer Library Center
Choose a multiple compound import method based on your data
* Will demo Compound
Object
Directory Structure
Object List
(No page-level metadata)
Postcards YES YES *YES
Documents YES
* YES
YES
Monograph * YES
YES YES
Do you have page-levelmetadata for the
compound objects?
Are your scan filesseparated into
compound objectdirectories?
Create compound objectdirectories for EACH
compound object.No
Yes
DIRECTORY
STRUCTURE
Yes
Do you have one tab-delimited text
file containing ALL the objects?
Are they all the sametype of compound
object?
Break up intobatches by
typeNo
No
OBJECT LIST
Yes
Do you have tab-delimited text files for
EACH compound object?
.
DIRECTORY STRUCTURE
.
Create text file listing allcompound objects and
object metadata orcreate a text file for each
compound object.
No
Yes
No
Yes
OCLC Online Computer Library Center
Every one of the four CONTENTdm compound object importing methods
• Requires object-level metadata
• Requires preparation
• File–naming, keeping sort order in mind
• Each object has own directory for scans
• May use tab-delimited text file(s)
• Accommodates transcripts
OCLC Online Computer Library Center
A word about descriptive page-level metadata
• Supported by some but not all 4 import methods
• NOT supported by Object List
• At page-level Title is only field required
• Technical metadata, can be generated by Template
creator
OCLC Online Computer Library Center
More on transcripts
Typescripts and transcripts
Requires a field designated as the data type “Full Text Search”
Inserted into the metadata field of the scanned page
During import
Through use of .txt file found, or
By Template Creator
If OCR Extension in use
Or by “Directory Import” as with early versions of CONTENTdm
Transcripts and typescripts are supported by all four methods (i.e., not considered “metadata” for purposes of this discussion)
OCLC Online Computer Library Center
Demo: Import Multiple Compound Objects
Monograph using Compound Object method
Postcards using Object List method
Documents using Directory Structure method
OCLC Online Computer Library Center
II. Exporting from CONTENTdm
To ascii tab-delimited with field headers
To xml:
Standard Dublin Core —only DC
Custom—all fields, including local but not structure
CDM Standard—all fields, including structure
OCLC Online Computer Library Center
III. Examples of collaboration for interoperability
• Web integration through search engines, RSS
• OAI harvesting
• Enable at collection or server level
• Choose to suppress <pagedata> or not
• WorldCat registration
• Open WorldCat integration
OCLC Online Computer Library Center
CONTENTdm and a new METS transform
Info available on USC in July
Code at SourceForge
Windows-oriented
The CONTENTdm to The CONTENTdm to METSMETS
conversion toolconversion tool
What is/are METS?What is/are METS?
Why is/are METS good?Why is/are METS good?
What is 7train?What is 7train?
How do I use 7train?How do I use 7train?
What do I get from 7train?What do I get from 7train?
How do I get 7train?How do I get 7train?
What is/are METS?What is/are METS?
METS (Metadata Encoding and Transmission Standard) is an METS (Metadata Encoding and Transmission Standard) is an XML-based standard for encoding metadata to describe XML-based standard for encoding metadata to describe
objects (digital or otherwise) within a digital library. objects (digital or otherwise) within a digital library.
See http://www.loc.gov/standards/mets/ for more informationSee http://www.loc.gov/standards/mets/ for more information
METS
metsHdr
structMap
dmdSec
amdSec
fileSec
behaviorSec
METS
metsHdr
structMap
dmdSec
amdSec
fileSec
behaviorSec
What is/are METS?
Yellow elements/tags are required; all others are optional
Metadata for the management of the object: technical details, object history, etc.
Description of the structure of the object, i.e. how the files fit togetherWhat to do with the object: machine actionable instructions
A list of files that make up the object
Descriptive metadata - title, author,subjects, etc.
Metadata about this particular METS - encoder, contact info, etc.
Why METS?Why METS?To be able to add your objects to other collections andTo be able to add your objects to other collections and
increase the visibility your institution's assets.increase the visibility your institution's assets.
What is 7train?What is 7train?
7train is an XSL-based tool for converting XML documents -7train is an XSL-based tool for converting XML documents -in this case CONTENTdm exports describing objects managedin this case CONTENTdm exports describing objects managedin the in the CONTENTdm system - into METSCONTENTdm system - into METS objects suitable for objects suitable forsubmission to a digital library system, such as the Californiasubmission to a digital library system, such as the California
Digital Library's Online Archive of California.Digital Library's Online Archive of California.
7train is a platform-independent, standalone tool that was7train is a platform-independent, standalone tool that wasdesigned to work on any system and to be simple to use.designed to work on any system and to be simple to use.
How does 7train work?How does 7train work?
It is as easy as dragging your It is as easy as dragging your CONTENTdm XML exportCONTENTdm XML export file fileonto an executable file. onto an executable file.
How does 7train work?How does 7train work?
How does 7train work? What do you How does 7train work? What do you
get?get?
Output: A Sample METS documentOutput: A Sample METS document
References & LinksReferences & Links
7train Home: 7train Home: http://seventrain.sourceforge.nethttp://seventrain.sourceforge.net
7train Download: 7train Download: http://seventrain.sourceforge.net/7train_download.htmlhttp://seventrain.sourceforge.net/7train_download.html
CONTENTdm: CONTENTdm: http://www.dimema.comhttp://www.dimema.com
METS: METS: http://www.loc.gov/standards/mets/http://www.loc.gov/standards/mets/
XSL: XSL: http://www.w3.org/Style/XSL/http://www.w3.org/Style/XSL/
The California Digital Library: The California Digital Library: http://www.cdlib.orghttp://www.cdlib.org
The Online Archive of California: The Online Archive of California: http://www.oac.cdlib.orghttp://www.oac.cdlib.org
OCLC Online Computer Library Center
CONTENTdm
ExistingLibraries
10K/50K/Unlimited Objects
Librarians, Archivists…
NewLibraries
Other CONTENTdm
sites
CONTENTdmMulti-Site
Server
For Library UsersFor Library Users
OPACS
OPEN WORLDCAT
OAI
MARCRECORDS
OAI
Web
WorldCat
Regional Union
Catalog
InteroperabilityInteroperability
Other digital
archives
OAI
OAI
XML DC
XML DC
DCDC
OCLC Online Computer Library Center
BREAK—15 minutes
This concludes Part 1
To come after the break:
Part 2
Customization
Part 3
Finding Aids
OCLC Online Computer Library Center
Customizing and integrating your CONTENTdm site
(Part 2 of 3)
Web templates
Custom Queries and Results
Configuration files
OCLC Online Computer Library Center
CONTENTdm Web Templates
Customizable for integration
Designed to support broad range of users
Small to large organizations
Beginners to experts
Use out of the box with minimal customization
Basic customization requires minimal HTML skills
Fully customize including advanced extensions
Based on a PHP API (Hypertext Preprocessor and
Application Program Interface)
OCLC Online Computer Library Center
Basic Customizations
Minimal skills needed
Easy to make changes
Global include files
Variables
Recommend all organizations do basic customizations
Header (name/logo), contact e-mail address, colors, about page, home page
http://www.contentdm.com/help4/custom/templates.html
OCLC Online Computer Library Center
Getting Started
Access to Web server docs directory
HTML editor or text editor
Design plan
Logo or other graphics
Backup copy of original files
OCLC Online Computer Library Center
Customization Demo
http://sr.contentdmdemo.com
Files located in /cdm4 directory
/includes/global_header.php
/client/LOC_global.php
/client/STY_global_style.php
about.php
browse.php
results.php
New logo saved in /cdm4/images/
OCLC Online Computer Library Center
Advanced Customizations Experience with HTML, PHP, and JavaScript needed
Customize looks for each collection
University of Nevada, Reno
Web Template extensions
E-commerce (University of Utah, Oregon State University)
Comment forms (SENYLRC, Enoch Pratt Free Library, OSU)
Custom metadata display (University of Oregon)
QuickTime video (Williams College)
http://www.contentdm.com/customers/index.html
OCLC Online Computer Library Center
Examples of Advanced Customizations
University of Nevada, Reno http://imageserver.library.unr.edu/
University of Utah http://www.lib.utah.edu/digital/bodmer/
Oregon State University http://digitalcollections.library.oregonstate.edu/cdm4/client/bracero/
SENYLRC http://www.hrvh.org/
Enoch Pratt Free Library http://www.mdch.org/
Williams College http://contentdm.williams.edu/
OCLC Online Computer Library Center
Customizations Tips
Always make a backup!
Be aware of encoding (UTF-8 vs. ASCII)
See what other users are doing
Share, borrow, and copy ideas and code
http://www.contentdm.com/customers/index.html
Listserv
Document changes
Document which files are edited and what code changes are made to ease upgrading to newer versions
OCLC Online Computer Library Center
Custom Queries and Results (CQR)
Create predefined, custom queries
Virtual collections
Guide users to specific results
Integrate with other sites
Multiple options
Simple hyperlink, drop-down list, index box, text box, browse
Easy to use
Wizard generates code to copy and paste into Web pages
Documentation
http://www.contentdm.com/help4/custom/cqr.html
http://www.contentdm.com/USC/tutorials/cqr.pdf
OCLC Online Computer Library Center
CQR DEMO
Generate code using CQR
Copy and paste into Web pages
May need to change path
Customize as desired
OCLC Online Computer Library Center
Configuration Files Customizable files that reside on the server
Stop words
Full text field stop words – fullstop.txt
Automatic hyperlink stop words – stopwords.txt
http://www.contentdm.com/help4/custom/stopwords.html
Image viewer
Customize how images are displayed – imageconf.txt
For all collections or per collection
http://www.contentdm.com/help4/custom/zoompan.html
OCLC Online Computer Library Center
Imageconf.txt Demo
Located in the /conf directory on the CONTENTdm server
Can change globally or for individual collections
If you wish to change the zoom and pan default settings for a particular collection, copy the imageconf.txt file from the Server/conf directory to the index/etc directory of the collection(s) you wish to modify.
Make a backup copy!
OCLC Online Computer Library Center
Introduction to Finding Aids
How many of you have them?
Are they digital documents or paper?
If digital, are they XML?
Basic: create documents, monographs, and use http protocol to link
XML: use EAD DTD, and style sheet to display
OCLC Online Computer Library Center
Handling Finding AidsPart 3
Importing EAD files to CONTENTdm
OCLC Online Computer Library Center
Current EAD Support
Import of EAD files
Automatic text extraction from EAD files when:
The file extension of the EAD is .xml.
The file includes a header record beginning with DOCTYPE ead.
The collection has a full text search field.
The full text search field is empty when the item is added to the collection.
Up to 128,000 characters extracted from the following fields and placed in the full text search field
titleproper, title, unititle, persname, famname, corpname, genreform
OCLC Online Computer Library Center
Current EAD Support
Display determined by style sheet
XSLT
CSS
Client side parsing
Affected by Web browser
OCLC Online Computer Library Center
Getting Started
EAD XML files
EAD DTD
XSLT style sheet
OCLC Online Computer Library Center
EAD Demo
Configure Full Text Search field
Store DTD and style sheet on server
Edit path to DTD and XSLT in EAD files
Import (single or batch)
Add metadata
Custom thumbnail if desired
Upload, approve, index
OCLC Online Computer Library Center
Custom EAD Extension
Example by Oregon State University
Terry Reese, [email protected]
Customized Web templates
Client side or server side parsing
Integrates display in templates
VBScript for extracting metadata from EAD to tab-delimited text file
www.contentdm.com/USC/templates/index.asp
OCLC Online Computer Library Center
Oregon State University EAD Collectionhttp://digitalcollections.library.oregonstate.edu/
OCLC Online Computer Library Center
Announcing new exposure for yourCONTENTdm Collections
Collection of Collections
http://collections.contentdmdemo.com/
(also featured at contentdm.com/customers)
Harvesting metadata from Collection sites at:
http://primarysources.contentdmdemo.com
Uses CONTENTdm Multi-site server