co-ordinated by aparsen.eu #aparsen co-funded by the european union under fp7-ict-2009-6 survey on...

15
Co-ordinated by aparsen.eu #APARSEN Co-funded by the European Union under FP7- ICT-2009-6 Survey on Italian Preservation Repositories Silvio Salza, [email protected] CINI-Università di Roma “La Sapienza” Storage Solution Webinar, April 14 th 2014

Upload: muhammad-christin

Post on 14-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Co-ordinated by

aparsen.eu #APARSEN

Co-funded by the European Union under FP7-ICT-2009-6

Survey on Italian Preservation Repositories Silvio Salza, [email protected]

CINI-Università di Roma “La Sapienza”

Storage Solution Webinar, April 14th 2014

Survey on Italian Preservation RepositoriesSilvio Salza, CINI-Università di Roma “La Sapienza”Webinar on Storage Systems, April 14th 2014

aparsen.eu #APARSEN

Co-funded by the European Union under FP7-ICT-2009-6

The APARSEN Wp23 questionnaire

• Questionnaire on Storage solutions and Scalability prepared and distributed as part of APARSEN WP23

• The questionnaire focused on:

- Profile of the repositories (mission, volumes, type of objects)

- Storage management policy

- Organization of the storage system

- Cost (TCO) and quality assessment

• CINI designed the questionnaire and was in charge of the Italian survey

• 8 large repositories in different areas were surveyed

Survey on Italian Preservation RepositoriesSilvio Salza, CINI-Università di Roma “La Sapienza”Webinar on Storage Systems, April 14th 2014

aparsen.eu #APARSEN

Co-funded by the European Union under FP7-ICT-2009-6

Italian regulations on digital preservation

• Organization of Italian repositories mostly driven by national regulations

• Regulation issued in a 2001 bill and later updated in 2010-14

• Are mandatory for Public Administrations since 2001

• Private companies must comply as well for some types of records: health-care records, fiscal records, e-invoices etc.

Quite often the focus is just on complying with the regulations: the design of the repository and the quality of the

preservation process are not given sufficient attention

Survey on Italian Preservation RepositoriesSilvio Salza, CINI-Università di Roma “La Sapienza”Webinar on Storage Systems, April 14th 2014

aparsen.eu #APARSEN

Co-funded by the European Union under FP7-ICT-2009-6

Profile of the surveyed repositories

• Most repositories are active since less than 5 years

• High yearly growth rate (average 100%)

• Generally a single type of digital object is preserved

• Access granted to registered users only

Mission

XXXX XXXXXXXX XXXX

Number of Digital Objects

< 10% 20% - 100% > 100% Cultural Heritage e-Gov Other

Yearly increase

Survey on Italian Preservation RepositoriesSilvio Salza, CINI-Università di Roma “La Sapienza”Webinar on Storage Systems, April 14th 2014

aparsen.eu #APARSEN

Co-funded by the European Union under FP7-ICT-2009-6

Storage management policy

• Only 50% of the repositories has a formally declared storage management policy (the ones in the e-gov area)

• None provided a link to a public policy document

• Crucial issues:

- Regular integrity checks (always specified)

- Backup interval (always specified)

- Data recovery workflow (specified in one case only)

Storage Management policy should always be formally declared and possibly made public

Survey on Italian Preservation RepositoriesSilvio Salza, CINI-Università di Roma “La Sapienza”Webinar on Storage Systems, April 14th 2014

aparsen.eu #APARSEN

Co-funded by the European Union under FP7-ICT-2009-6

Three-level storage organization

Most repositories (but not all) declared a three-level storage organization:

- Preservation: core level devoted to preservation

- Access: front-end level to support external access

- Backup: back-end level for periodic dumps

Access Preservation Backup

Mirrors the core level and protects it from external accesses

Periodical dumps of the core level

Survey on Italian Preservation RepositoriesSilvio Salza, CINI-Università di Roma “La Sapienza”Webinar on Storage Systems, April 14th 2014

aparsen.eu #APARSEN

Co-funded by the European Union under FP7-ICT-2009-6

Storage implementation

• In 3 cases there was no separate access level

• One repository claimed it unnecessary since it was using a WORM device (EMC2 Centera) for the core level

• Two others claimed RAID at the core level provided enough redundancy and the file system provided for write protection

• Backups typically made on a weekly basis

Preservation

RAID5 RAID1 None RAID5 HD WORM RAID5 RAID1 Tape-DVD

Access Backup

Survey on Italian Preservation RepositoriesSilvio Salza, CINI-Università di Roma “La Sapienza”Webinar on Storage Systems, April 14th 2014

aparsen.eu #APARSEN

Co-funded by the European Union under FP7-ICT-2009-6

About storage media and systems

• Tape cartridges are OK for backup

• DVD and other consumer-level optical media should be avoided as too risky, but are still used in small repositories

• RAID replication at the core level is not equivalent to having a separate level for access (this would be a separate device)

• Using a single level of WORM devices, despite their quality, has some serious drawbacks:

- These devices typically rely on proprietary firmware

- Data can’t be read without the intermediation of the firmware

- Replication is still limited to a single device

Survey on Italian Preservation RepositoriesSilvio Salza, CINI-Università di Roma “La Sapienza”Webinar on Storage Systems, April 14th 2014

aparsen.eu #APARSEN

Co-funded by the European Union under FP7-ICT-2009-6

Local versus geographical replication

• Replication is the key element to achieve reliability

• Different levels of replication:

- Device: within a given device (e.g. RAID5)

- Local: locally but involving different devices

- Geographical: replicated data kept in different locations

• Local (and device) replication is vulnerable to catastrophic (but not unlikely) events: flood, fire, earthquake

• Reliability of RAID systems assumes that faults of different devices are statistically independent (a tricky assumption!)

• If the room where the devices are is flooded all of them will fail

Survey on Italian Preservation RepositoriesSilvio Salza, CINI-Università di Roma “La Sapienza”Webinar on Storage Systems, April 14th 2014

aparsen.eu #APARSEN

Co-funded by the European Union under FP7-ICT-2009-6

The night of the earthquake

• University of L’Aquila in Central Italy maintained and updated daily a backup copy of its records in a computing center in Bologna (some 300 Km away)

• On April 9th 2004 an earthquake destroyed most of the city

• Thanks to the geographical replication, not a single record was lost

Bologna

L’Aquila

Survey on Italian Preservation RepositoriesSilvio Salza, CINI-Università di Roma “La Sapienza”Webinar on Storage Systems, April 14th 2014

aparsen.eu #APARSEN

Co-funded by the European Union under FP7-ICT-2009-6

How reliable is my repository?

• Interviewed repository managers were asked to give some figures to assess the quality of the preservation service:

- Reliability: probability of not loosing any data in a given time

- Availability: percentage of time the system can be accessed

- Cost: TCO (Total Cost of Ownership) per TB/year

• Only a few provided answers to these questions

• Only very few answers were credible: one guy claimed his repository had achieved 100% reliability!! Can he fly too?

Inability to provide these figures is a clear indicator of the poor level of the design

Survey on Italian Preservation RepositoriesSilvio Salza, CINI-Università di Roma “La Sapienza”Webinar on Storage Systems, April 14th 2014

aparsen.eu #APARSEN

Co-funded by the European Union under FP7-ICT-2009-6

What about outsourcing?

• All storage levels in the surveyed repositories were in-house

• But part of the questionnaire dealt with outsourcing options

• Cloud storage was proposed as a main option to provide geographical replication, at least for some storage level

• The result was discouraging: no answer, even if we insisted

• The attitude was like that of children saying: “I won’t eat it, and I won’t even taste it!”

One guy finally claimed that cloud was too expensive and too unreliable. But the same guy was unable to

provide any figure for his own reliability and TCO

Survey on Italian Preservation RepositoriesSilvio Salza, CINI-Università di Roma “La Sapienza”Webinar on Storage Systems, April 14th 2014

aparsen.eu #APARSEN

Co-funded by the European Union under FP7-ICT-2009-6

Conclusion 1: Improve the design process

• A good design should evaluate different alternatives

• Quantitative elements should be used to compare them:

- TCO

- Reliability

- Availability

- Level and type of replication

- Lifespan

• These elements also form the basis for assessing the quality of the preservation service

• The storage management policy should be clearly stated

Survey on Italian Preservation RepositoriesSilvio Salza, CINI-Università di Roma “La Sapienza”Webinar on Storage Systems, April 14th 2014

aparsen.eu #APARSEN

Co-funded by the European Union under FP7-ICT-2009-6

Conclusion 2: Exploit new opportunities

• Improve reliability by exploiting redundancy

• Geographical redundancy is a key element

• Move remotely at least one level of storage: don’t put all your eggs in one basket

• Overcome prejudices about outsourcing:

- Why in-house systems should be better?

- One may get reasonable control of outsourced resources

- Special conditions can be negotiated

• Cloud storage is a great opportunity: it should be carefully considered before being dismissed

aparsen.eu #APARSEN

Network of Excellence