using scalable and secure web technologies to design global format registry

16
1 Using Scalable and Secure Web Using Scalable and Secure Web Technologies to Design Global Technologies to Design Global Format Registry Format Registry Muluwork Geremew, Sangchul Song and Joseph JaJa Institute for Advanced Computer Science Studies Department of ECE, University of Maryland Sponsored by Library of Congress and NSF

Upload: arin

Post on 12-Jan-2016

35 views

Category:

Documents


0 download

DESCRIPTION

Using Scalable and Secure Web Technologies to Design Global Format Registry. Muluwork Geremew, Sangchul Song and Joseph JaJa Institute for Advanced Computer Science Studies Department of ECE, University of Maryland Sponsored by Library of Congress and NSF. Motivation. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Using Scalable and Secure Web Technologies to Design Global Format Registry

1

Using Scalable and Secure Web Using Scalable and Secure Web Technologies to Design Global Format Technologies to Design Global Format

RegistryRegistry

Muluwork Geremew, Sangchul Song and Joseph JaJa

Institute for Advanced Computer Science Studies

Department of ECE, University of Maryland

Sponsored by Library of Congress and NSF

Page 2: Using Scalable and Secure Web Technologies to Design Global Format Registry

2

MotivationMotivation• Handling of digital formats is an essential

part of long-term preservation.• Format obsolescence

– Technology evolution and the obsolescence of systems and applications software may leave users unable to access their old files.

– Software developers may go out of business and no longer support the applications.

• Digital preservation requires– Different essential aspects of objects. – Tools for capturing the essential format

characteristics of information stored as digital object and processing it.

Page 3: Using Scalable and Secure Web Technologies to Design Global Format Registry

3

Existing MethodologiesExisting Methodologies

• Standardizing the digital contents to few common formats.– JPEG2000, OMF, and PDF/A are among the few

selected open standard formats.

• Migration– Transforms older versions to newer formats.– Tends to be costly and prone to errors.

• Emulation– The original bit-streams are executed using an

emulator.– Implementing such a strategy is extremely

challenging and can be viewed as a transformation.

Page 4: Using Scalable and Secure Web Technologies to Design Global Format Registry

4

Our GoalOur Goal• A flexible framework for incorporating advances

achieved through the existing approaches.

• Development of an efficient, scalable and platform independent prototype to enable the tracking and handling of format obsolescence.– Development of a Global Digital Format Registry

(GDFR) – FOrmat CUration Service (FOCUS)– Development of enabler modules that can interface

between GDFR and end-user applications.

Page 5: Using Scalable and Secure Web Technologies to Design Global Format Registry

5

FOCUS ArchitectureFOCUS Architecture

Page 6: Using Scalable and Secure Web Technologies to Design Global Format Registry

6

FOCUS on LDAP and SOAPFOCUS on LDAP and SOAP

• Interoperability– Protocols are platform independent

• Performance– Most operations are read-only queries. LDAP gives high

performance in this environment.

• Extensibility– LDAP schema can be easily extended

• Scalability– By the use of Distributed LDAP

• Security– SOAP can be on top SSL (https)– LDAP-based Format Registry can be easily integrated

with any other LDAP-based authentication/authorization mechanisms.

Page 7: Using Scalable and Secure Web Technologies to Design Global Format Registry

7

Global Digital Format RegistryGlobal Digital Format Registry

• GDFR serves to provide detailed information about formats.

• Existing Format Registries:– UPenn’s FRED- (http://tom.library.upenn.edu/fred)

– Pronom- (http://www.nationalarchives.gov.uk/pronom/)

– Wotzit’s Format- (http://www.wotsit.org)

• Not clear how extensible, scalable, or how they can be interfaced with existing preservation systems.

Page 8: Using Scalable and Secure Web Technologies to Design Global Format Registry

8

FOCUSFOCUS

• The registry contains information– File formats– Software tools

• Multiple ways to access GDFR in FOCUS are provided.– Directly through LDAP interface– Indirectly through SOAP interface

WebServiceAgent

GlobalDigitalFormatRegistry

Software

Software

Page 9: Using Scalable and Secure Web Technologies to Design Global Format Registry

9

GDFR-Internal StructureGDFR-Internal Structure

dc=umiacs, dc=umd, dc=edu

ou=Format-Registry

ou=Applications ou=Formats

Adobe Acrobat v6.0

Adobe Photoshop v7.0

Adobe PDF v1.4

CompuServ GIF 1989a

JPEG Image Format 2000 Jhove 1.0

General descriptive General descriptive properties.properties.Processing: rendering, Processing: rendering, editing, conversion and editing, conversion and validation validation services/systemsservices/systems..

General General descriptive descriptive properties.properties.Processing : Processing : format taken as format taken as input and/or output. input and/or output.

Page 10: Using Scalable and Secure Web Technologies to Design Global Format Registry

10

Web-Service AgentWeb-Service Agent

• Mediator between user and registry• Serviced via SOAP• Contains a file format identifier module, FIDER

– Java module for format identification– Uses file magic number– Sequential from restrictive to general

WebServiceAgent

GlobalDigitalFormatRegistry

Client

FormatInquiry

Page 11: Using Scalable and Secure Web Technologies to Design Global Format Registry

11

Web-Service AgentWeb-Service Agent

• Tailorability– Specific needs of an existing preservation

system can be met by custom-tailoring Web-Service.

• Interoperability – Independent of OS and languages

• Convenience – Multiple LDAP queries can be reduced to one

Web Service function call. – Any updates can be done in a single place, not

having to distribute new modules to end users

Page 12: Using Scalable and Secure Web Technologies to Design Global Format Registry

12

FOCUS- Supplementary ToolsFOCUS- Supplementary Tools

• Validation Software– Verifies and validates file formats of given file.

• Rendering Software– Interprets bit streams of files into human-

friendly representation on the screen.

• Editing Software– Adds/Deletes/Modifies the contents of given

file, keeping the correct file format.

• Conversion Software– Converts a file format to current or emerging

formats.

Page 13: Using Scalable and Secure Web Technologies to Design Global Format Registry

13

Validation Software

Validation Software

Conversion Software

Conversion Software

WebServiceAgent

Identification Service

Identification Service

RenderingSoftware

RenderingSoftware

FOCUS Service ModelFOCUS Service Model

FormatRegistry

Identifies format of a specific DO using the internal signature

Determines a verification service to verify the format of a specific DO

Identifies current rendering conditions for specific digital format.

Locates transformation services to convert DO from source format to format of interest.

Page 14: Using Scalable and Secure Web Technologies to Design Global Format Registry

14

Example Scenario: Digital Object Example Scenario: Digital Object Format VerificationFormat Verification

Validation Service

Validation Service

Conversion service

Conversion service

WebServiceAgent

ID Service

ID Service

RenderingService

RenderingService

FormatRegistry

Format ?Format ?

Format ID / Format Info

Verifier?

App ID / App Info

Verify this?

Verify this?

Valid/Well-formed

Step 1: User requests to identify the format a file via Web Service

Step 2: Registry returns format ID and format information

Step 3: User requests for information on available verifier for this formatStep 4: Registry returns validation service ID and information, such as its service location

Step 5: User connects to the validation service and verify the formatStep 6: Validation service returns the

verification result

WebServiceAgent

FormatRegistry

Page 15: Using Scalable and Secure Web Technologies to Design Global Format Registry

15

DemoDemo

Page 16: Using Scalable and Secure Web Technologies to Design Global Format Registry

16

ConclusionConclusion• FOCUS design offers maximum

– Flexibility – Web Service Agent can be easily tailored to

meet the various needs of different preservation institutions.

– Scalability – Format registry can also be distributed.

• FOCUS integrates current format preservation techniques and makes them available through SOAP-based web interface.

• In summary, we believe that the FOCUS prototype represents a significant advance towards the development of secure and scalable digital format registry.