chapter 12 - basic preservation strategies

Upload: foveros-foveridis

Post on 05-Apr-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 Chapter 12 - Basic Preservation Strategies

    1/6

    Chapter 12

    Basic Preservation Strategies

    Strategy without tactics is the slowest route to victory. Tactics without strategy is the

    noise before defeat.

    (Sun Tzu)

    There are a number of basic preservation strategies upon which one can build more

    complex strategies. These are the ones which are described explicitly or implicitly

    by OAIS, based around ensuring that the digital object will be usable and under-

    standable to the Designated Community. Of course one also has to maintain the

    trail of information to support evidence of authenticity and other PDI.

    Many publications on digital preservation say that the available strategies may

    be summed up in the phrase emulate or migrate. We show here that this is

    inadequate.

    OAIS discusses some important aspects of information preservation as follows.

    The fast-changing nature of the computer industry and the ephemeral nature of

    electronic data storage media are at odds with the key purpose of an OAIS: to pre-

    serve information over a long period of time. No matter how well an OAIS maintains

    its current holdings, it will eventually need to migrate much of its holdings to dif-

    ferent media (which may or may not involve changing the bit sequences) and/or

    to a different hardware or software environment to keep them accessible. Todaysdigital data storage media can typically be kept at most a few decades before the

    probability of irreversible loss of data becomes too high to ignore. Further, the rapid

    pace of technology evolution makes many systems much less cost-effective after

    only a few years. In addition to the technology changes there will be changes to the

    Knowledge Base of the Designated Community which will affect the Representation

    Information needed.

    There are a number of fundamental approaches to information preservation. In

    the first the Content Data Object remains in its original form, and access and use is

    achieved by providing adequate descriptions of the digital encoding with Structureand Semantic Representation Information; in some cases the original access and

    use mechanisms are adequate, in which case software emulation (using Other

    Representation Information) may be useful, although this tends to limit the ways

    197D. Giaretta, Advanced Digital Preservation, DOI 10.1007/978-3-642-16809-3_12,C Springer-Verlag Berlin Heidelberg 2011

  • 7/31/2019 Chapter 12 - Basic Preservation Strategies

    2/6

    198 12 Basic Preservation Strategies

    in which the Content Data Object may be used. One advantage of leaving the bit

    sequences unchanged is that evidence of Authenticity is more easily sustained.

    Alternatively the object may be changed into one that can be processed with

    contemporary access and use mechanisms. This is referred to in OAIS as a

    Transformation, a type of Migration, which is discussed below. There are impli-cations for Authenticity which are discussed in Chap. 13, particularly Sect. 13.6.2.

    The following matrix shows the various combinations of these alternatives.

    Content data object unchanged Content data object changed

    Access service

    unchanged

    If using the original software

    executable: emulation

    If using the original source code:rebuild executable

    Re-implement access service

    Access service

    changed

    Implement new access services based

    on the representation information

    describing original content data

    object

    Implement new access services

    based on the representation

    describing the new content data

    object

    12.1 Description Adding Representation Information

    As should be clear from the discussion in earlier chapters it is necessary to maintainthe Representation Network so that it is adequate for a member of the Designated

    Community to continue to understand and use the digital object. However things

    change over time and so the Representation Network must be altered appropriately.

    In order to do this the techniques extensively discussed in Chap. 8

    to identify any potential gaps in the Representation Network can be

    used. Practical ways of doing this are described in detail in Chap. 16

    and illustrated in Part II.

    This approach allows the greatest flexibility because one has the ability to dis-

    cover entirely new ways of looking at the digital objects, however whilst it can be

    the most rewarding, it can also be the most difficult.

    12.2 Maintaining Access

    An alternative to using description is to maintain the current ways of accessing the

    digital object, and OAIS discusses several ways of doing this. One can think of thisin terms of interfaces, either programmatic or user interfaces. In addition hardware

    emulation can be viewed as doing essentially the same thing but this deserves the

    more extensive discussion given in Sect. 7.9, although another type of emulation is

    described below.

    http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
  • 7/31/2019 Chapter 12 - Basic Preservation Strategies

    3/6

    12.2 Maintaining Access 199

    12.2.1 Access and Use Services

    OAIS discusses maintaining the Dissemination API in order to continue to support

    applications which the Designated Community uses to access and use the digital

    object. This is closely related to the ideas of virtualisation discussed in Sect. 7.8.The virtualisation approach has the advantage that it facilitates the ability of the

    Designated Community to be able to use their favourite applications to access and

    use the digital object. This can be consistent with maintaining the Dissemination

    API by means of appropriate software wrappers. A number of options are discussed

    in some detail in Chap. 9.

    12.2.2 Access Software Look and Feel

    This option focuses on the assumption that the Designated Community wishes to

    maintain the original look and feel of the Content Information of a set of AIUs as

    presented by a specified application or set of applications. Discussion of hardware

    emulation, which provides the ultimate maintenance of look and feel is provided in

    Sect. 7.9. Conceptually, the OAIS provides (i.e. makes available/points to) a soft-

    ware environment that allows the Consumer to view the AIUs Content Information

    through the applications transformation and presentation capabilities. For example,

    there may be a desire to use a particular application that extracts data from an ISO

    9660 CD-ROM and presents it as a multi-spectral image. This application runs undera particular operating system, requires a set of control information and use of a CD-

    ROM reading device, and presents the information to driver software for a particular

    display device. In some cases this application may be so pervasive that all members

    of the Designated Community have access to the environment and the OAIS merely

    designates the Content Data Object to be the bit string used by the application.

    Alternatively, an OAIS may supply (as Representation Information) such an envi-

    ronment, including the Access Software application, when the environment is less

    readily available. However, as the OAIS and/or the Designated Community moves

    to new computing environments, at some point the application will cease to func-tion or will function incorrectly. At such a point Transformation will become an

    attractive option.

    12.2.2.1 Emulation of Look and Feel the Hard Way

    It is worth discussing in a little more detail another way of maintaining look and

    feel when, for example the compiled version of the application or libraries it depends

    upon, are not available, nor is the source code. The term emulation may be applied to

    this technique since emulation may be defined as the ability of a computer program

    or electronic device to imitate another program or device [79].

    The OAIS may, despite the drawbacks, consider emulation for the access applica-

    tion in the following way. If the application provides a well-known set of operations

    and a well-defined API for access, the API could be adequately documented and

    http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
  • 7/31/2019 Chapter 12 - Basic Preservation Strategies

    4/6

    200 12 Basic Preservation Strategies

    tested to attempt an emulation of that application. However, if the consumer inter-

    face is primarily one of display or other devices which affect human senses (e.g.,

    sound), this reverse engineering becomes nearly impossible, because it may not be

    obvious when the application runs but does not function correctly for all possi-

    ble inputs. To guarantee the discovery of all such situations, it would be necessaryto record the Access Softwares correctly functioning output, and preserve this

    alongside the emulation. The behaviour would need to be checked with the results

    obtained after from the emulation. This may be quite difficult if the application has

    many different modes of operation. Further, if the applications output is primarily

    sent to a display device, recording this stream does not guarantee that the display

    looks the same in the new environment and therefore the combination of applica-

    tion and environment may no longer be giving completely correct information to the

    Consumer.

    Maintaining a consistent look and feel may require, as a starting point, captur-ing that look and feel with a separate recording to use as validation information.

    In general, it may be difficult if not impossible to formally describe the look and

    feel. However, a number of Transformational Information Properties may essen-

    tially define criteria against which preservation may be tested; validation against

    these Information Properties would be a necessary, although not always sufficient,

    condition for testing the adequacy of the preservation activity.

    12.3 Migration/Transformation

    At some point it may be decided that maintaining the original medium or the

    Representation Network for a digital object is not practical for cost reasons, or does

    not meet requirements for some other reason. Therefore the digitally encoded infor-

    mation must be encoded in some other way, either the same bit sequences on new

    media or else changed bit sequences.

    It is possible to identify four primary digital Migration types. The primary types,

    ordered by increasing risk of information loss, are:

    1. Operations which do not change the bit sequences

    Refreshment: A Digital Migration where a media instance, holding one or

    more AIPs or parts of AIPs, is replaced by a media instance of the same

    type by copying the bits on the medium used to hold AIPs and to manage

    and access the medium. As a result, the existing Archival Storage mapping

    infrastructure, without alteration, is able to continue to locate and access

    the AIP. As discussed at the start of the book many processes go on to translate from

    magnetic domains (for a magnetic disk) to bits. This bit copy may not be a

    physical copy.

    Replication: A Digital Migration where there is no change to the Packaging

    Information, the Content Information and the PDI. The bits used to convey

    these information objects are preserved in the transfer to the same or new

  • 7/31/2019 Chapter 12 - Basic Preservation Strategies

    5/6

    12.3 Migration/Transformation 201

    media-type instance. Refreshment is also a Replication, but Replication may

    require changes to the Archival Storage mapping infrastructure.

    2. Operations which change the bit sequences

    Repackaging: A Digital Migration where there is some change in the bitsof the Packaging Information.

    Transformation: A Digital Migration where there is some change in the

    Content Information or PDI bits while attempting to preserve the full

    information content. This deserves some extended discussion, which follows.

    12.3.1 Transformation

    Transformation implies a change in the bit sequence of either the ContentInformation or the PDI.

    In many discussions of digital preservation the term Migration is used

    when in fact what is meant is specifically Transformation because

    the aim in those discussions is to change the digital encoding of the

    information.

    Given a certain piece of information there could be many different ways of

    encoding it digitally. For example an image could be encoded as a TIFF file or a

    JPEG; a document could be held as Word or PDF; a table containing scientific data

    could be held as a FITS table or as a CSV (comma-separated values) file. Each of

    these alternatives would need it their own, different, Representation Network.

    However some Transformations make more sense than others. This will com-

    monly be regarded as changing from one data format to another, but one must also

    think about the associated semantics. Some formats have little or no room for the

    semantics. Another consideration is the number and types of applications commonly

    associated with the various formats.

    For example an image could be regarded as a table where each of the cells con-

    tains a number. However it would not make good sense to encode the image as a

    CSV file because of the loss of semantics involved. Moreover the applications (e.g.

    spreadsheet programmes) normally used to deal with a CVS file do not normally

    display the data as one would expect an image to be displayed.

    With regard to the semantics, one can supplement the capabilities of a particu-

    lar format with something else e.g. the CSV file could have an associated text file

    to supply the missing semantic information, such as the meanings of the columns,

    which would otherwise be missing. In this case one would need the Representation

    Information for (1) the CSV file (2) the text file and (3) the relationship between

    them. While this is possible, the more attractive option would be to choose anew format which can itself handle the required semantics, with available appli-

    cations that supply the required functionality, at least as well as the original format.

    Therefore given a piece of digitally encoded information that one needs to preserve,

    the transformation which one should reasonably apply is not arbitrary.

  • 7/31/2019 Chapter 12 - Basic Preservation Strategies

    6/6

    202 12 Basic Preservation Strategies

    There are deep reasons for making a careful choice and documenting that choice

    appropriately. This is discussed in detail in Sect. 13.6.

    However there are a number of useful points which should be made here. For

    example one can think of the ideal Transformation in which the new digital object

    has the same information as the original. If this is the case then it should be pos-sible to confirm this by means of another Transformation back to the original bit

    sequence. If one can find this pair of Transformations then one can define (following

    the revised version of OAIS):

    Reversible Transformation: A Transformation in which the new represen-

    tation defines a set (or a subset) of resulting entities that are equivalent to

    the resulting entities defined by the original representation. This means that

    there is a one-to-one mapping back to the original representation and its set

    of base entities.

    On the other hand if one looks at the other transformations mentioned above, for

    example from FITS to CSV, then one would, without additional information, e.g.

    the supplementary text file mentioned above, lose information and therefore not be

    able to make the reverse transformation.

    It is therefore reasonable to define:

    Non-Reversible Transformation: A Transformation which cannot be guar-

    anteed to be a Reversible Transformation.

    An important point to note is that the definition ofnon-reversible is drawn as

    broadly as possible. For example one does not need to have to prove there is no

    backward transformation, only that one cannot guarantee that such a transformation

    can be constructed.

    We will come back to these definitions in Chap. 13 where they play an important

    role in considerations of Authenticity.

    12.4 Summary

    This chapter has raced through a number of the basic preservation strategies and

    techniques; it should be clear that each technique has its own strengths and weak-

    nesses, and one must be careful to recognise these. The reader must be careful not

    to be misled by the amount of material on emulation here; this was a useful loca-

    tion for this material. Other preservation techniques are discussed in much more

    detail throughout this book. Other chapters are devoted to descriptive Representation

    Information and also to Transformations.

    In Part II we provide examples of many of these techniques with evidence to

    support their efficacy when applied appropriately.

    http://-/?-http://-/?-http://-/?-http://-/?-