mets at uc berkeley generating mets objects. background kinds of materials: –primarily imaged...

21
METS at UC Berkeley Generating METS Objects

Upload: erica-shelton

Post on 02-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

METS at UC Berkeley

Generating METS Objects

Background

• Kinds of materials: – primarily imaged content & tei encoded

content• archival materials: manuscripts and pictorial

collections• oral histories

• Kinds of Metadata– Structural metadata: physical structure– Descriptive metadata – BasicTechnical metadata about digital files

and how they were produced

Tools For Producing METS Objects

• GenDB– Gathers structural, descriptive and

technical metadata

• GenX– Generates METS objects from

GenDB

GenDB

• Consists of:– Relational database (Currently SQL Server)– Locally developed software for gathering

metadata and facilitating digital processing

Div 1

GenDB Database StructureStructural Metadata

Div 2

Div 3

Object 1

Object 2

(root)

(parent = div 1)

(parent = div 1)

Div 1

Div 2

Div 3

(root)

(parent = div 2)

(parent = div 1)

Div 4 (parent = div 2)

Object 1 Div 1 Div 2 Div 3

Object 2 Div 1 Div 2 Div 3 Div 4

Structural Md Table

Div 1

GenDB Database StructureDescriptive Metadata

Div 2

Div 3

Object 1

Object 2 Div 1

Div 2

Div 3

Div 4

Core Desc Md

Core Desc Md

Core Desc Md

Core Desc Md

Core Desc Md

Core Desc Md

Core Desc Md

Name 1

Name 2

Name 3

Note 1

Note 2

Note 3

Name Table

Note Tables

Structural Md Table

Div 1

GenDB Database StructureContent File/Technical Md

Div 2

Div 3

Object 1

Master Image Table

Derivative Image Table

Structural Md Table

Drv 1

Drv 2

Drv 3

Mstr 1

Mstr 2

Technical Md

Technical Md

Drv 4

Technical Md

Technical Md

Technical Md

Technical Md

Populating the Database Tables

• Web interface: manual input of structural and descriptive metadata

• Digitization Management modules

– Generate work orders to guide digitization process

– Import content file information and technical metadata coming out of digitization process

• Batch loader: batch input based on TEI encodings, legacy metadata

Web Interface: WebGenDB

WebInterface

SQL ServerDatabase

Java Servlet

Java Server

XML Config Files

rmi

jdbc

Digitization Management Modules

WebInterface

Java ServletJava Server

SQL ServerDatabase

Imaging/TranscriptionWorkOrders

Vendor

Technical MDSpreadsheets

Batch Loader

WebInterface

SQL ServerDatabase

Java Servlet

Java Server

Java Batch Loader

XML Batch Load File

TEI Docs

XSLT

WebGenDB

The concepts that drove the design• Shielding user from METS complexity• Highly configurable• Unicode support• Access driven by login privileges• Use of Open Source software and

components• Distributed approach

XML Configuration Files

• Three levels– Common to all projects elements

– Common to all screens in a project elements

– Specific to a screen in a project

• Define fields common to all projects• Define fields used in specific project• Define screens by project & object type

AlProjects.xml

Proj1.xml

Proj2.xml

ObjectType1.xml

ObjectType2.xml

ObjectType1.xml

ObjectType2.xml

Relation among XML files

<ObjectType> <name>workorder</name> <fileLocation> /data/_w/GenDB/WEB-INF/classes/edu/berkeley/library/propertyFiles/CalCultureWorkOrderScreensFile.xml</fileLocation> </ObjectType>

<Field> <name>Image</name><type>checkbox</type><label>Image </label><size>1</size> </Field>

<Field> <name>Text</name><type>checkbox</type><label>Text </label><size>1</size> </Field>

<Field> <name>Title</name><type>text</type><label>Title </label><size>60</size> </Field>

Project XML file example

Software used

• MSSQL running on NT• Tomcat 4.1.2 implementing servlets 2.3• Jsdk 1.4• Xalan 2.4• Xerces 1.0.3• FOP 0.12.1• JDOM beta 8• Opta 2000

Relationship of GenDB to METS

• Metadata not directly stored in METS, MODS or MIX schema formats.– Much of the database structure was developed

before these standards emerged– Database structure and content adjusted to be

compatible with all these formats

GenX: From GenDB to METS

• Allows Digital Publishing Group staff to select the objects in the GenDB database that are ready for export and to export them as METS objects.

GenX Architecture

AppInterface

GenDB

Java Application METS XML Repository

JDBC

GenX Output

• METS output corresponding to version 1.3

• Descriptive metadata exported to METS descMD in MODS 2.0 format

• Technical Metadata exported to METS techMD in MIX format

• Planned:– Text technical md to METS descMD in

NYU TextMD– Rights to METS rightsMD in ODRL subset

Links

• GenDB Web Interface Demo– http://sunsite2.berkeley.edu/GenD– login: demo– password: demo

• Developers:– [email protected][email protected][email protected]