ala annual june 2008 contentdm in context geri ingram oclc digital collection services manager,...
TRANSCRIPT
ALA Annual
June 2008
CONTENTdm in ConTEXTCONTENTdm in ConTEXT
Geri Ingram
OCLC Digital Collection ServicesManager, Customer Services
Who should attend this morning?Who should attend this morning?
To get the most from the next hour and a half,
Either you have:
•Experience building CONTENTdm collections
OR
•Attended CONTENTdm Training
• Hands-on: on-site or on-line
• Demonstration only: Basic Use Webinar
OutlineOutline
Part One: Review
• Software architecture
• Collections and Projects
Part Two: Demonstration
• Importing and searching full text
• Research papers
• Yearbooks
• Postcards
• Books
Acquisition Stations or “clients”
Acquisition Stations or “clients”
JPEG2000 ExtensionJPEG2000 Extension
OCR ExtensionOCR Extension
Administration tools• Statistics• Authorization settings• Exporting to WorldCat
Administration tools• Statistics• Authorization settings• Exporting to WorldCat
• Custom Web interfaces• Custom Web interfaces
Web-based ‘Add’
Web-based ‘Add’
CONTENTdm Server
Unix (Linux, Solaris) orWindows (2000, 2003)
CONTENTdm Server
Unix (Linux, Solaris) orWindows (2000, 2003) CONTENTdm
site pagesCONTENTdm site pages
CONTENTdm Architecture
Archival repositoryArchival
repository
OCLC Connexion
‘digital import’
OCLC Connexion
‘digital import’
Search engines
E.g., Google®
WorldCat.orgWorldCat Local
Search engines
E.g., Google®
WorldCat.orgWorldCat Local
Configuring a collectionConfiguring a collection
What’s a Collection?
A group of objects (items) that
• Share the same metadata schema
• Live on the same CONTENTdm server
How many Collections can I have?
• Up to 200 collections per server
How many items can be in a collection?• 16 million items per collection
Populating a collectionPopulating a collection
Through the use of a “Project”
What’s a CONTENTdm Project?
A workspace on your personal computer
• Into which you import up to 5000 items at a time
• Where items reside until you upload to the server
A group of settings that are applied to the items
• E.g., image display resolution, file format, branding
• E.g., automatic metadata input
How many Projects can I have at one time?
Limited only by your disk space on the workstation
RELATIONSHIP of Collection to ProjectsRELATIONSHIP of Collection to Projects
A single CollectionCollection
Many Projects
Collection
Collection
Project 1Project 1
Project 3Project 3
Project 2Project 2
What’s a CONTENTdm object or item?What’s a CONTENTdm object or item?
CONTENTdm can store/index/search items in various formats
Display any file format:• Viewed with a Web browser natively or viewed via
a plug-in
• Including: JPEG, JPEG2000, TIFF, PDF, WAV or MP3 audio, AVI or MPEG video, html, MrSID®
Simple items—e.g., images, sound files, research papers (We’ll load papers today as PDF items.)
Compound objects—multiple simple items assembled together
CONTENTdm Compound ObjectsCONTENTdm Compound Objects
CONTENTdm defined classes
• Documents
• We will load a section of a yearbook
• Postcards
• We will load a handwritten postcard with a typescript
• Monographs (Structured documents)
• We will load a book with chapters
• Picture Cube (six-sided views)
Dublin Core metadata element setDublin Core metadata element set
Review: Basics of CONTENTdmReview: Basics of CONTENTdm
Simple and Qualified Dublin Core element sets offered
• 100 fields per collection
• Only DC.Title required to create a record
• Dublin Core is basis for cross-collection searching
Text is stored in a metadata field
• 128,000 characters per “full text search” field
200 collections/server—i.e., 200 different metadata schema
Providing searchable textProviding searchable text
Remember: metadata fields can be made searchable
In addition, full-text, extracted from the digital object itself can be stored in a metadata field designated as “Full text search” data type, in any of three ways:
1. Extracted (by server) from PDFs (if embedded to begin with)
2. Imported as .txt transcript
• Typescripted from handwritten
or
• OCR’d in advance (external OCR engine)
3. Generated by OCR “on-the-fly” (integrated ABBYY FineReader®)
Review: Populating collections Review: Populating collections
Acquisition Station Projects (PC client)
Add from CONTENTdm Administration (Browser-based)
Connexion digital import (WorldCat cataloging client function)
Review: 1. Acquisition Station—PC clientReview: 1. Acquisition Station—PC client
Project workspace
Project settings
Tools to manage
• Image settings
• Metadata settings
Review: 2. Add –web based functionReview: 2. Add –web based function
Platform independent
Simple item add function may be used for single import of:
• Images—.jpg, .jp2, .tif (if bandwidth allows)
• PDF—single and multi-page
• Audio
• Video
Review: 3. Connexion digital import functionReview: 3. Connexion digital import function
Simple items—some examples that carry textSimple items—some examples that carry text
Reformatted materials e.g., books, documents, posters, broadsides, memos—scans may all contain text
Born digital files e.g., PDFs, single or multi-page
• Single-page PDFs viewed as items
• May opt for ‘in-line’ Adobe viewer
• Multi-page PDFs may be handled as if compound object of type “document”
• Server side conversion
• Import as simple item regardless of conversion choice
Excerpted from Creating and managing text collections using
CONTENTdm
Excerpted from Creating and managing text collections using
CONTENTdm
First things First-- Recap: Prepare the CollectionFirst things First-- Recap: Prepare the Collection
For importing searchable text items, whether singly or in batch—at minimum:
1. One empty, searchable field is configured as “Full text search” data type to hold text
2. Collection is configured to treat PDFs as compound objects.
3. Collection is configured to provide Full Resolution file management.
4. Other fields are made searchable, hidden, moved, or added, as needed.
5. OPTIONAL: the Web templates are adjusted to suppress display of components of compound objects in search results.
Recap: Prepare the itemsRecap: Prepare the items
These PDFs have been created with searchable text embedded.
Beware: Not all PDFs are created equal!
Demonstration 1a--Simple itemsDemonstration 1a--Simple items
• One simple item—PDF with ‘hidden’ text
Acquisition Station Import file
Web-based Add
Demonstration 1b--Multiple simple items (Acquisition Station) Demonstration 1b--Multiple simple items (Acquisition Station)
A batch of simple items, two ways:
Method A: Import a batch of simple digital items stored in folders
(where Template Creator only is used to automatically generate metadata)
Method B: Import a tab-delimited text file naming and describing the digital items
(where metadata also resides in imported tab-d file)
Recap: Behind the scenes: prepare the items, organize folders
Recap: Behind the scenes: prepare the items, organize folders
Method A:
PDFs had been created with text (Adobe, Word conversion)
For importing a batch of PDFs in one load,
• All PDFs were stored in one folder.
• Digitization Training
Recap: Behind the scenes: prepare the items, organize folders
Recap: Behind the scenes: prepare the items, organize folders
Method B:
PDFs had been created with text (Adobe, Word conversion)
For importing a batch of PDFs in one load,
• All PDFs were stored in one folder.
For loading with tab-d files:
• Prepare .txt file of metadata
• Place it in a directory different from the .pdf files
Demonstration 2—Single Compound objects Demonstration 2—Single Compound objects
Yearbook (OCR’d transcript produced on the fly)
Handwritten Postcard (with a previously created typescript file)
Book (Separate transcript produced in advance)
Questions & AnswersQuestions & Answers
Getting help with Text
• User Support Center
• Downloading the appropriate Acquisition Station
• JPEG2000
• Installing, activating the OCR extension
• Tutorials to study
• Help files related to text works
• Write [email protected]
Questions?Questions?
Collections of documents:Text-based letters, newspapers, diaries, yearbooks, PDFs, and more
60-Day Free CONTENTdm Evaluation60-Day Free CONTENTdm Evaluation
https://www3.oclc.org/app/contentdm/evaluation/
Section BreakLine Two
Section BreakLine Two
Subtitle here
Contact: Ron Gardner, OCLC
1-800-848-5878
For more information about CONTENTdm…
www.oclc.org/contentdm/