chapter 4 - types of digital objects

Upload: foveros-foveridis

Post on 05-Apr-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 Chapter 4 - Types of Digital Objects

    1/9

    Chapter 4Types of Digital Objects

    There are more things in heaven and earth, Horatio, than are dreamt of in your

    philosophy.(William Shakespeare , Hamlet)

    There are many types of digital objects which we may come across and we need torecognise the extent of their diversity otherwise we will aim too low when we designour tools and techniques for digital preservation.

    It is impossible to give an exhaustive list of types of digital objects, yet it is use-ful to remind ourselves of at least some of the great variety that we must be ableto deal with. By types we mean not just different formats, but rather differentclassications.

    One reason for being interested in the variety of types is that unless one is awareof the distinctions it is very easy to assume that everything is the same and thesame tools can be used. For example if one normally deals with the preservationof documents, for example Word or PDF, then one might assume that all digitallyencoded information can be preserved using the same tools. Unfortunately this isnot true, as we will see. The next sections present a brief overview of some of thedistinctions which can be made, without any claim of being exhaustive.

    4.1 Simple vs. Composite

    One way to classify digital objects is by whether they normally are treated as awhole for example an image such as Fig. 4.1 or whether they are normallytreated as a collection of simpler parts, for example a FITS le which has severalimages and tables, as in Fig. 4.2 . The latter we will call Composite Objects (orsometime Complex Objects).

    It is important to make this distinction because if we can break the preserva-tion challenge of a composite object into smaller components then it will make thepreservation task easier. On the other hand if we treat the composite object as if itwere a simple one then we could run into a great deal of trouble in future.

    31D. Giaretta, Advanced Digital Preservation , DOI 10.1007/978-3-642-16809-3_4,C Springer-Verlag Berlin Heidelberg 2011

  • 7/31/2019 Chapter 4 - Types of Digital Objects

    2/9

    32 4 Types of Digital Objects

    Fig. 4.1 A simple image face.jpg

    Header

    Image 1

    Image 2

    Table 1

    Table 2

    Fig. 4.2 FITS le as acomposite object

    However it is never completely clear cut because whether a digi-tal object is simple or composite often depends upon the eye of thebeholder. Nevertheless this is a useful distinction to draw.

    A Word document may normally be treated a simple object. In actual fact itis, internally, very complex, containing information about styles and page layoutetc. However one normally disregards this because the software we use deals withthe Word le as a whole. On the other hand some Word les have embedded

  • 7/31/2019 Chapter 4 - Types of Digital Objects

    3/9

  • 7/31/2019 Chapter 4 - Types of Digital Objects

    4/9

    34 4 Types of Digital Objects

    On the other hand one can have a digital object for which it is not enough tosimply render it but for which one needs to know what the contents mean in orderto be able to further process it.

    It is useful to make this distinction because it is easy to think that every digitalobject is simply rendered; that every digital object need only be displayed.

    Indeed one could argue that the ultimate user of a digital object is ahuman who needs to see or hear (or perhaps in future to feel, taste orsmell) the result. For example even a FITS image is (often) displayed.

    However displaying a FITS image is rarely the ultimate aim. Insteadan astronomer might want to make measurements which require anunderstanding of the units and coordinate systems. He/she might alsoreasonably want to combine this piece of data with another. In other

    words what is wanted is to do more than render it in one particularway; instead there is an enormous variety of ways users may want todeal with the object.

    When we are thinking about digital preservation one must look tothe future not in order to guess what it may to be but rather torecognise that it may be different from today. Therefore we needto identify what someone at least the Designated Community needs in order to understand and use a non-rendered object digitalobject in any number of different ways.

    For example consider two text les. In one case one can have some English text,say a recipe for a cake in a le recipe.txt (see Fig. 4.4 ). Using a Windows PCthe le is easily readable because the .txt part of the name lets the machine tryan application which can display an ASCII encoded le which is what this is.Normally one would say that no special knowledge is needed to understand this itsimply needs to be read.

    However there is a requirement to be able to read English and also to know whatthe various measures are (for example what size is a cup?) and also to know

    what the ingredients are (for example what is lemon zest?); without suchknowledge the recipe is neither understandable nor usable.

    Take 2 e gg s

    Add 3 c up s of gr am flo u r

    Add 2 ts p lemon zest

    ......

    Fig. 4.4 Text le recipe.txt

  • 7/31/2019 Chapter 4 - Types of Digital Objects

    5/9

    4.2 Rendered vs. Non-rendered 35

    Consider now another text le (table.txt) which, as a simple .txt le is easilyreadable on a PC again the .txt usually lets us guess, correctly in this case, thatthis is an ASCII encoded le.

    In this case we are more obviously in some trouble because although we can seesomething which we can reasonably assume are numbers, we do not know what thenumbers mean.

    If we are told that the numbers under the headings X, Y and Z provideus with the sides of a rectangular cuboid, then we can calculate the volume of thatshape using the formula X YZ for each row, namely 14.742. 31.8 and 114.034.

    On the other hand we might be told that X is the longitude on Earth, Y thelatitude, both measured in degrees and Z is the concentration of a certain chemicalin parts per billion.

    We see that the format alone is insufcient; one needs to know whatthe contents (e.g. the numbers) mean.

    By Non-Rendered Digital Object we mean things which, like table.txt, are notsimply rendered but rather are to be processed to produce any number of pos-sible outputs. For example table.txt could be plotted, displayed as a pie-chart or

    histogram. Alternatively the information in the columns of table.txt could be usedto calculate the density of chlorophyll in the Amazon rain forest (if that is the sortof information there is in table.txt).

    As another example one can take a digital object from the GOME instrument[21], which might be as shown in Figs. 4.5 , 4.6 , and 4.7 .

    Fig. 4.5 GOME data binary

    Fig. 4.6 GOME data asnumbers/characters

    http://-/?-http://-/?-http://-/?-
  • 7/31/2019 Chapter 4 - Types of Digital Objects

    6/9

    36 4 Types of Digital Objects

    Fig. 4.7 GOME data processed to show ozone data with particular projection

    We can also have two les of the same format, say a sound le such as MP3, therst of which (music.mpg) is indeed something that can be used to play music,but a second, also an MP3 le (cong.mpg), which contains numbers which areconguration parameters for setting up some software. If we click on the rst on ahome computer then it will play some music because the .mpg causes the com-puter try to use a music application. Clicking on the second will cause the computerto try to use that same application but it may produce only a brief grating sound, orperhaps nothing audible at all.

    The important points are that we currently rely on many clues, such as having

    a le ending .txt or .mpg which many computers use to choose an applica-tion for displaying or playing the le. On the other hand, even now these clues areinsufcient, as with table.txt (Fig. 4.8 ).

    Of course computers are not intelligent in fact they have been instructed whichapplications to use for which le extensions, for example Notepad for les with

    X Y Z

    1.3, 2.7, 4.2

    2.4, 5.3, 2.5

    7.4, 2.3, 6.7

    Fig. 4.8 Text le table.txt

  • 7/31/2019 Chapter 4 - Types of Digital Objects

    7/9

    4.2 Rendered vs. Non-rendered 37

    names ending in .txt. Sometimes this does not do what is expected, as with con-g.mpg. In other cases we can do something with the le but not very much, aswith table.txt.

    Some others mentioned in the introduction, such as family photographs(face.jpg, Fig. 4.1 ) are very similar in that what one expects is to display or playcontents of the le and then it is up to the viewer, or listener, to understand it. Of course one is not listening to the bits what we mean is that there is an applicationwhich is used to convert the bits to an image or a sound. The application may alsoallow one to zoom in to part of an image or search for a piece of text or copy a pieceof music and insert it in a separate le. But even without these extra functions, onecan make use of the le, by which we mean we can look at or hear the output of theapplication and we would be quite happy if that was all we could do.

    These type of les lets use the term Digital Object as a more general term

    instead of le- we will refer to as Rendered Digital Objects . For these types of objects it is (currently) normally regarded as sufcient if in future one can simplydisplay it if it is an image or movie, or play it if it is a sound.

    These are the types of digital objects which one commonly deals with in everydaylife, documents, images, web pages etc. There are many books which talk about thepreservation of these kinds of objects:

    word processor documents nancial les spreadsheets databases of various sorts . . . . .

    Throughout this book we will also look at examples from a variety of disciplinesincluding science, cultural heritage and contemporary performing arts.

    Science

    Observations of the Earth from space, including multi-spectral images, syntheticaperture radar images

    Measurements of the atmosphere, chemical or electrical composition Software for processing raw date to data which is scientically useful

    Cultural Heritage

    Laser scans of buildings and artefacts Plans of buildings 3-D virtual reality models

    Performing Arts

    patch le for processing what the performer plays conguration le which map video capture of movement to musical performance.

    All the above are just some of the example of non-rendered data which are of importance to society.

  • 7/31/2019 Chapter 4 - Types of Digital Objects

    8/9

    38 4 Types of Digital Objects

    4.3 Static vs. Dynamic

    Digital objects do (usually) need software and hardware to extract information fromthe bits as discussed in Sect. 1.1 . Static objects are ones which, unless they aretransformed, are unchanged as bit sequences. These we will refer to as Static DigitalObjects .

    On the other hand we can think about database les which naturally change overtime as entries are changed. Alternatively we can consider a whole collection of lesas the data object. Such a collection might change as additional les are added tothe collection over time. Such digital objects we will refer to as Dynamic DigitalObjects .

    Of course at any particular time the Dynamic Digital Object is a particular StaticDigital Object which we may preserve. On the other hand it may be of interest, in

    the case of a Dynamic Digital Object, to know what the state of the object was atany particular time. In fact some would argue that most datasets change over timeand the state at each particular moment in time may be important. This is an impor-tant area requiring further research; however from the point of view in this book it may be useful to break the issue into separate parts. At each moment in timewe could, in principle, take a snapshot and store it. That snapshot has its associ-ated Representation Network. Efcient storage of a series of snapshots may leadone to store differences or include time tags in the data. Additional RepresentationInformation would be needed which describes how to get to a particular times

    snapshot from the efciently encoded version.

    4.4 Active vs. Passive

    One other useful distinction is between what may be called active and passive digitalobjects.

    By Passive Digital Object we mean something with which things are done, forexample used by other applications (software) to do something. For example a doc-

    ument le is used by a word processing programme to print the document or displayit on the screen, or an astronomical image in a FITS le would be used by astro-nomical analysis software to do scientic research. Such digital objects are oftenreferred to as data but since the term Data Object is already used by OAIS weprefer the term Passive Digital Object.

    An Active Digital Object on the other hand does something. For example theword processing application or the astronomical analysis software mentioned in theprevious paragraph might be the digital objects to be preserved.

    Once again there will always be fuzzy boundaries, so one could consider an

    Access[TM] database as a Passive Digital Object used by the Access software but it could easily itself contain software (for example some form of BASIC) whichwould mean that it could be considered to be an Active Digital Object.

    http://-/?-http://-/?-http://-/?-
  • 7/31/2019 Chapter 4 - Types of Digital Objects

    9/9

    4.6 Summary 39

    4.5 Multiple-Classications

    The classications are not mutually exclusive, and in fact one can think of a simple-rendered-static-passive object the image face.jpg is an example of this. One canalso have a composite-non-rendered-dynamic-active object such as a database withbuilt in queries into which new rows are being inserted. The Word.exe executablele may be thought of as a composite-non-rendered-static-active object.

    Figure 4.9 shows a representation of multiple classications although we arelimited to drawing in 3-dimensions!

    Rendered

    Non-Rendered

    Static Dynamic

    DynamicStatic

    Complex

    Simple

    C o m p

    l e x

    S i m p l e

    R e n d e

    r e d

    N o n -

    R e n d e r

    e d

    Fig. 4.9 Types of digitalobjects

    4.6 Summary

    The purpose of this chapter has been to provide a partial view of the variety of types of digital objects which exist in the wild and which one might be requiredto preserve. The reason has been to ensure that the reader can at least recognisethe possibilities when confronted with the challenge of preserving a digital object.Later chapters will discuss preservation techniques for some of this multitude of possibilities.