ala annual june 2008 contentdm in context geri ingram oclc digital collection services manager,...

Post on 15-Jan-2016

222 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

ALA Annual

June 2008

CONTENTdm in ConTEXTCONTENTdm in ConTEXT

Geri Ingram

OCLC Digital Collection ServicesManager, Customer Services

Who should attend this morning?Who should attend this morning?

To get the most from the next hour and a half,

Either you have:

•Experience building CONTENTdm collections

OR

•Attended CONTENTdm Training

• Hands-on: on-site or on-line

• Demonstration only: Basic Use Webinar

OutlineOutline

Part One: Review

• Software architecture

• Collections and Projects

Part Two: Demonstration

• Importing and searching full text

• Research papers

• Yearbooks

• Postcards

• Books

Acquisition Stations or “clients”

Acquisition Stations or “clients”

JPEG2000 ExtensionJPEG2000 Extension

OCR ExtensionOCR Extension

Administration tools• Statistics• Authorization settings• Exporting to WorldCat

Administration tools• Statistics• Authorization settings• Exporting to WorldCat

• Custom Web interfaces• Custom Web interfaces

Web-based ‘Add’

Web-based ‘Add’

CONTENTdm Server

Unix (Linux, Solaris) orWindows (2000, 2003)

CONTENTdm Server

Unix (Linux, Solaris) orWindows (2000, 2003) CONTENTdm

site pagesCONTENTdm site pages

CONTENTdm Architecture

Archival repositoryArchival

repository

OCLC Connexion

‘digital import’

OCLC Connexion

‘digital import’

Search engines

E.g., Google®

WorldCat.orgWorldCat Local

Search engines

E.g., Google®

WorldCat.orgWorldCat Local

Configuring a collectionConfiguring a collection

What’s a Collection?

A group of objects (items) that

• Share the same metadata schema

• Live on the same CONTENTdm server

How many Collections can I have?

• Up to 200 collections per server

How many items can be in a collection?• 16 million items per collection

Populating a collectionPopulating a collection

Through the use of a “Project”

What’s a CONTENTdm Project?

A workspace on your personal computer

• Into which you import up to 5000 items at a time

• Where items reside until you upload to the server

A group of settings that are applied to the items

• E.g., image display resolution, file format, branding

• E.g., automatic metadata input

How many Projects can I have at one time?

Limited only by your disk space on the workstation

RELATIONSHIP of Collection to ProjectsRELATIONSHIP of Collection to Projects

A single CollectionCollection

Many Projects

Collection

Collection

Project 1Project 1

Project 3Project 3

Project 2Project 2

What’s a CONTENTdm object or item?What’s a CONTENTdm object or item?

CONTENTdm can store/index/search items in various formats

Display any file format:• Viewed with a Web browser natively or viewed via

a plug-in

• Including: JPEG, JPEG2000, TIFF, PDF, WAV or MP3 audio, AVI or MPEG video, html, MrSID®

Simple items—e.g., images, sound files, research papers (We’ll load papers today as PDF items.)

Compound objects—multiple simple items assembled together

CONTENTdm Compound ObjectsCONTENTdm Compound Objects

CONTENTdm defined classes

• Documents

• We will load a section of a yearbook

• Postcards

• We will load a handwritten postcard with a typescript

• Monographs (Structured documents)

• We will load a book with chapters

• Picture Cube (six-sided views)

Dublin Core metadata element setDublin Core metadata element set

Review: Basics of CONTENTdmReview: Basics of CONTENTdm

Simple and Qualified Dublin Core element sets offered

• 100 fields per collection

• Only DC.Title required to create a record

• Dublin Core is basis for cross-collection searching

Text is stored in a metadata field

• 128,000 characters per “full text search” field

200 collections/server—i.e., 200 different metadata schema

Providing searchable textProviding searchable text

Remember: metadata fields can be made searchable

In addition, full-text, extracted from the digital object itself can be stored in a metadata field designated as “Full text search” data type, in any of three ways:

1. Extracted (by server) from PDFs (if embedded to begin with)

2. Imported as .txt transcript

• Typescripted from handwritten

or

• OCR’d in advance (external OCR engine)

3. Generated by OCR “on-the-fly” (integrated ABBYY FineReader®)

Review: Populating collections Review: Populating collections

Acquisition Station Projects (PC client)

Add from CONTENTdm Administration (Browser-based)

Connexion digital import (WorldCat cataloging client function)

Review: 1. Acquisition Station—PC clientReview: 1. Acquisition Station—PC client

Project workspace

Project settings

Tools to manage

• Image settings

• Metadata settings

Review: 2. Add –web based functionReview: 2. Add –web based function

Platform independent

Simple item add function may be used for single import of:

• Images—.jpg, .jp2, .tif (if bandwidth allows)

• PDF—single and multi-page

• Audio

• Video

Review: 3. Connexion digital import functionReview: 3. Connexion digital import function

Simple items—some examples that carry textSimple items—some examples that carry text

Reformatted materials e.g., books, documents, posters, broadsides, memos—scans may all contain text

Born digital files e.g., PDFs, single or multi-page

• Single-page PDFs viewed as items

• May opt for ‘in-line’ Adobe viewer

• Multi-page PDFs may be handled as if compound object of type “document”

• Server side conversion

• Import as simple item regardless of conversion choice

Excerpted from Creating and managing text collections using

CONTENTdm

Excerpted from Creating and managing text collections using

CONTENTdm

First things First-- Recap: Prepare the CollectionFirst things First-- Recap: Prepare the Collection

For importing searchable text items, whether singly or in batch—at minimum:

1. One empty, searchable field is configured as “Full text search” data type to hold text

2. Collection is configured to treat PDFs as compound objects.

3. Collection is configured to provide Full Resolution file management.

4. Other fields are made searchable, hidden, moved, or added, as needed.

5. OPTIONAL: the Web templates are adjusted to suppress display of components of compound objects in search results.

Recap: Prepare the itemsRecap: Prepare the items

These PDFs have been created with searchable text embedded.

Beware: Not all PDFs are created equal!

Demonstration 1a--Simple itemsDemonstration 1a--Simple items

• One simple item—PDF with ‘hidden’ text

Acquisition Station Import file

Web-based Add

Demonstration 1b--Multiple simple items (Acquisition Station) Demonstration 1b--Multiple simple items (Acquisition Station)

A batch of simple items, two ways:

Method A: Import a batch of simple digital items stored in folders

(where Template Creator only is used to automatically generate metadata)

Method B: Import a tab-delimited text file naming and describing the digital items

(where metadata also resides in imported tab-d file)

Recap: Behind the scenes: prepare the items, organize folders

Recap: Behind the scenes: prepare the items, organize folders

Method A:

PDFs had been created with text (Adobe, Word conversion)

For importing a batch of PDFs in one load,

• All PDFs were stored in one folder.

• Digitization Training

Recap: Behind the scenes: prepare the items, organize folders

Recap: Behind the scenes: prepare the items, organize folders

Method B:

PDFs had been created with text (Adobe, Word conversion)

For importing a batch of PDFs in one load,

• All PDFs were stored in one folder.

For loading with tab-d files:

• Prepare .txt file of metadata

• Place it in a directory different from the .pdf files

Demonstration 2—Single Compound objects Demonstration 2—Single Compound objects

Yearbook (OCR’d transcript produced on the fly)

Handwritten Postcard (with a previously created typescript file)

Book (Separate transcript produced in advance)

Questions & AnswersQuestions & Answers

Getting help with Text

• User Support Center

• Downloading the appropriate Acquisition Station

• JPEG2000

• Installing, activating the OCR extension

• Tutorials to study

• Help files related to text works

• Write contentdmsupport@oclc.org

Questions?Questions?

ingramg@oclc.org

Collections of documents:Text-based letters, newspapers, diaries, yearbooks, PDFs, and more

60-Day Free CONTENTdm Evaluation60-Day Free CONTENTdm Evaluation

https://www3.oclc.org/app/contentdm/evaluation/

Section BreakLine Two

Section BreakLine Two

Subtitle here

Contact: Ron Gardner, OCLC

gardnerr@oclc.org

1-800-848-5878

For more information about CONTENTdm…

www.oclc.org/contentdm/

top related