digital preservation cloud services for libraries and archives

26
Digital Preservation Cloud Services for Libraries and Archives DLF 2011 Baltimore, MD Quyen L. Nguyen NARA

Upload: qlnguyen

Post on 09-May-2015

2.249 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Digital Preservation Cloud Services for Libraries and Archives

Digital Preservation Cloud

Services for Libraries and

Archives

DLF 2011

Baltimore, MD

Quyen L. Nguyen – NARA

Page 2: Digital Preservation Cloud Services for Libraries and Archives

Introduction

LDPaaS

Levels of Service and Cost Model

Related Work

Conclusion

Oct. 31, 2011 2011 DLF Forum 2

Outline

Page 3: Digital Preservation Cloud Services for Libraries and Archives

3

Functional Requirements

Need for Long-Term Digital Preservation

– Policy mandates: retention of governments’ records

– Knowledge function: preserve digitized books and digital born

materials

– History-oriented mandates: preservation of cultural heritage

Challenges

– Rapid growth of digital objects that require archiving.

– Data heterogeneity

Oct. 31, 2011 2011 DLF Forum

Page 4: Digital Preservation Cloud Services for Libraries and Archives

4

Desired System Characteristics

Dynamic Scalability

– Increase as well as decrease

Cost-effective Maintainability

– Operation cost

– Patches: COTS, security.

Evolvability

– Technology refresh

– New features and services

Oct. 31, 2011 2011 DLF Forum

Page 5: Digital Preservation Cloud Services for Libraries and Archives

5

Cloud Computing Characteristics

Elasticity

– Computing and storage resources

– Three levels of cloud services: IaaS, PaaS, and SaaS.

– Quick Provisioning (e.g. Cloud Market [3])

– Pay-as-you-go

Cost-efficient Maintenance

– Economies of scale

– Maximizing utilization of computing resources

Evolvability by configuration

Oct. 31, 2011 2011 DLF Forum

Page 6: Digital Preservation Cloud Services for Libraries and Archives

OAIS Reference Model

Oct. 31, 2011 2011 DLF Forum 6

Page 7: Digital Preservation Cloud Services for Libraries and Archives

LDPaaS

Long-term Digital Preservation as a Cloud Service

– Encompass major OAIS functionalities

– Not only storage service,

– But also preservation service according to customer’s

policies: retention period, preservation level, and access

level.

Beneficial to Cloud Service Consumer

– Relieve records owners from the burden of engineering

and provisioning preservation infrastructure

Beneficial to Cloud Service Provider

– Realize economies of scales by sharing unused

computing resources

Oct. 31, 2011 2011 DLF Forum 7

Page 8: Digital Preservation Cloud Services for Libraries and Archives

8

Ingest Provisioning Challenges

Unpredictability due to business policies

– Uneven flow of transfer volume

– Various object sizes, hence object numbers

– Various object types

Cloud Computing benefits:

– Computation resources

File format identification and Application of Integrity Seal

– Storage resources: Ingest processing Buffer Space

Oct. 31, 2011 2011 DLF Forum

Page 9: Digital Preservation Cloud Services for Libraries and Archives

9

Access Provisioning Challenges

Oct. 31, 2011 2011 DLF Forum

Unpredictability of publishing

– Volume of publishable data sets

Spikiness of Access request load

Access types: Storage Delivery Networks vs Content

Delivery Networks.

Cloud Computing benefits

– Computation: access-time visualization, zooming, conversion to

access format

– Storage: High-efficiency Access disk cache

Page 10: Digital Preservation Cloud Services for Libraries and Archives

10

Preservation Provisioning Challenges

Oct. 31, 2011 2011 DLF Forum

Prominent preservation methods:

Bit-level: error detection and correction capabilities

Transformation

Computing resources for transformation processes

Storage served as a scratchpad for transformation.

Emulation: virtual machine requirements.

Cloud Computing benefits

– Computation: Execution of Preservation Algorithms

– Storage: Preservation Processing Buffer Space

Page 11: Digital Preservation Cloud Services for Libraries and Archives

11

Storage Provisioning Challenges

It is all about Storage capacity

Oct. 31, 2011 2011 DLF Forum

Scale of Storage Requirement May be Best Suited to Function as

Hyper Large-Scale Cloud Provider

Moderate-to-Small-Scale Cloud Consumer

Could there be a Community Cloud?

Page 12: Digital Preservation Cloud Services for Libraries and Archives

Software Paradigms

Structural Object-oriented SOA Cloud

Oct. 31, 2011 2011 DLF Forum 12

Virtu

aliz

atio

n

Page 13: Digital Preservation Cloud Services for Libraries and Archives

System Architecture

Oct. 31, 2011 2011 DLF Forum 13

Page 14: Digital Preservation Cloud Services for Libraries and Archives

SOA-based Ingest Process

Ingest

Virus Scan

File Format Identification

DROID

JHOVE

Metadata Extraction

Integrity Seal

Move to Preservation

Storage

• Ingest Process implemented as composite service

• Could be implemented by BPEL.

Oct. 31, 2011 2011 DLF Forum 14

Page 15: Digital Preservation Cloud Services for Libraries and Archives

15

LDPaaS Levels of Service

Oct. 31, 2011 2011 DLF Forum

Service Levels

Ingest

IL1: Transfer Only

IL2: With Format Identification

IL3: Metadata Extraction

Preservation

PL1: Bit

PL2: Content

PL3: Content, Behavior & Formatting

Discovery

DL1: Metadata search

DL2: Full content search

Access

AL1: Passive Viewer

AL2: Interactive Viewer

AL3: Content Mining

Storage

SL1: Delayed Access - Near-Line Storage

SL2: Rapid Access - High Performance Storage

Content Server

CL1: Just-in-Time Active

CL2: Always Active

Page 16: Digital Preservation Cloud Services for Libraries and Archives

Level of Service Definitions

16 Oct. 31, 2011 2011 DLF Forum

Definition 1.

Each Content Server has a set of LoS formalized by the following 6-

tuple:

C = (CL, IL, PL, DL, AL, SL).

Definition 2.

Since a customer can have one or more Content Servers, a customer’s

SLA is specified by the n-tuple:

L = (C1, …, Cn), if the customer has signed up for n Content Servers,

with each Ci being a 6-tuple defined according to Definition 1.

Page 17: Digital Preservation Cloud Services for Libraries and Archives

LoS - Example 1

17 Oct. 31, 2011 2011 DLF Forum

Digital Library Repository

Define Content Server C1 by C1 = (CL1, IL2, PL2, DL1, AL2, SL2)

Content Server CL1 - Active Just-in-Time - this repository is sporadically used

Ingest Service IL2 - File Format Identification

Preservation Service PL2 - Preservation at the Content Level

Discovery Service: DL1 - Metadata Search

Access Service: AL2 - Interactive Viewer is provided for access.

Storage Service SL2 - Rapid Access, High Performance Disk - the volume is static

Page 18: Digital Preservation Cloud Services for Libraries and Archives

LoS - Example 2

18 Oct. 31, 2011 2011 DLF Forum

Digital Library Repository for Research Publications Two Sets of Records Stored in Two Different Content Servers: C1 and C2

C1 - Relatively Small Volume of High-Demand Digital Assets

C1 = (CL1, IL2, PL3, DL1, AL1, SL2) CL1 - Active Just-in-Time Content Server

IL2 - File Format Identification

PL3 - Preservation at the Content and Formatting Level

DL1 - Metadata Search

AL1 - Passive Viewer

SL2 - High Performance, Rapid Access Storage

C2 - Backend Repository, Volume Increasing with Time

C2 = (CL2, IL2, PL3, DL2, AL1, SL1) CL2 - Always Active Content Server

IL2 - File Format Identification

PL3 - Preservation at the Content and Formatting Level

DL2 - Full Content Search

AL1 - Passive Viewer

SL1 - Delayed Access Storage

Page 19: Digital Preservation Cloud Services for Libraries and Archives

LoS - Example 3

19 Oct. 31, 2011 2011 DLF Forum

Sarbanes-Oxley Act Compliance Business Archive

Retain and Preserve Records in a Sliding Time Window of Seven Years

C1 = (CL1, IL2, PL1, DL2, AL1, SL1)

PL1 - Preservation Service at the Bit Level

Retention Period of Seven Years – Elaborate Preservation not Needed

SL1 - Delayed Access Storage

Archive Intended for Audit Purposes Only - Rapid Access to Data not Essential

Page 20: Digital Preservation Cloud Services for Libraries and Archives

20

Cost Model Cost is one of the crucial elements in Cloud Computing

Let O = (V, N) be the Body of N Digital Objects and total

volume V

Cost (O, Service) depends on the level of service.

– Function of V or N or both.

Examples:

fIL1 - Utilization Cost for Digital Object Transfer, varies with V

fIL2 - File Type Identification

fIL3 - Metadata Extraction

TOTAL COST (O,C) = Cost (O, Service), where

where Service = {Ingest, Preservation, Discovery, Access, Storage}

Oct. 31, 2011 2011 DLF Forum

Vary with N

Page 21: Digital Preservation Cloud Services for Libraries and Archives

21

Cost Model Example

Let C1 = (CL2, IL2, PL1, DL1, AL1, SL1). Assume :

fCL2 (V,N) = 20V + 100 N;

fIL2 (N) = 10 N;

fPL1 (V) = 20 V;

fDL1 (N) = 30 N;

fAL1 (V) = 30 V;

fSL1 (V) = 40 V.

For Set O1 of Objects with V1 = 10 GB and N1 = 106

totalCost(O1,C1) = 140,000,740

For Set O2 of Objects with V2 = 103 GB and N2 = 102

totalCost(O2,C1) = 88,000

Note : totalCost(O2,C1) < totalCost(O1,C1) , although V2 > V1

Oct. 31, 2011 2011 DLF Forum

Page 22: Digital Preservation Cloud Services for Libraries and Archives

Related Work CiteSeer study by Teregowda [2]:

– Examine each service in the architecture stack in terms of feasibility

and cost of migrating and hosting in the Cloud.

– Possible integration with Cloud Storage thanks to current virtualized

storage component.

DuraCloud [5]:

– Open source platform for digital libraries and archives

– Adapters to commercially available Cloud Storage services

Strategies and SLAs for bit-level preservation by Zierau [6]:

– Various sub-levels of bit-preservation.

www.cloudpreservation.com: archives and indexes data

from websites and social networks.

www.ltdprm.org/ - Long-Term Digital Retention and

Preservation Reference Model: cloud-based digital archive.

Oct. 31, 2011 2011 DLF Foruml 22

Page 23: Digital Preservation Cloud Services for Libraries and Archives

Conclusion

Proposed LDPaaS concept: why is it useful?

– Beneficial to large organizations

– Beneficial to small organizations

Notional cost model useful for establishing a price

model associated with published SLA set.

Contend that Cloud Storage Service vendors can

augment their portfolios to provide LDPaaS.

Community Cloud for Preservation

– Environment for more collaboration and sharing

Oct. 31, 2011 2011 DLF Forum 23

Page 24: Digital Preservation Cloud Services for Libraries and Archives

24

References

1. Michael Armbrust et al. “A View of Cloud Computing”. Communications of the ACM,

Volume 53, No 4, April 2010.

2. P. Teregowda, Burgaonkar, B. and C. L. Giles. “Cloud Computing: A Digital

Libraries Perspective”. 2010 IEEE 3rd International Conference on Cloud Computing,

Miami, FL, July 2010.

3. Stephen Abrams, Patricia Cruse, and John Kunze. “Preservation Is Not a Place”.

The International Journal of Digital Curation, Issue 1, Volume 4, 2009.

4. Steve Hitchcock, David Tarrant, Adrian Brown, Ben O’Steen, Neil Jefferies, and

Leslie Carr. “Towards Smart Storage for Repository Preservation Services”. The

International Journal of Digital Curation, Issue 1, Volume 5, 2010.

5. DuraCloud. Available: http://www.duraspace.org/duracloud.php.

6. Eld Zierau, Ulla Bogvad Kejser, and Hannes Kulovits. “Evaluation of Bit Preservation

Strategies”. 7th International Conference on Preservation of Digital Objects

(iPRES2010), Sep. 19-24, 2010, Vienna, Austria.

Oct. 31, 2011 2011 DLF Forum

Page 25: Digital Preservation Cloud Services for Libraries and Archives

Disclaimer

The content of this presentation is the personal opinion of

the author and does not necessarily reflect any position of

the U.S. Government or the National Archives and Records

Administration.

Oct. 31, 2011 2011 DLF Forum 25

Page 26: Digital Preservation Cloud Services for Libraries and Archives

26

Thank You!

Any questions?

mailto:[email protected]

Oct. 31, 2011 2011 DLF Forum