chapter 9 - understandability and usability of data
TRANSCRIPT
7/31/2019 Chapter 9 - Understandability and Usability of Data
http://slidepdf.com/reader/full/chapter-9-understandability-and-usability-of-data 1/9
Chapter 9
Understandability and Usability of Data
I hear and I forget. I see and I remember. I do and I understand.
(Confucius)
Ensuring that digitally encoded information remains usable and understandable over
time is, together with authenticity, at the heart of digital preservation. The previous
chapter discussed some of the formal aspects of intelligibility. This chapter discusses
the complementary issue of usability of the data.
Usable means “capable of use” (OED), “available or convenient for use”
(www.dictionary.com).
In design, usability is the study of the ease with which people can employ a
particular tool or other human-made object in order to achieve a particular goal. In
human – computer interaction and computer science, usability studies the elegance
and clarity with which the interaction with a computer program or a web site is
designed (Wikipedia).
Here, by usable we mean that someone is able to do something sensible with the
information it contains. We recognise that this might not be easy – but at least it
should be possible to carry out.
One could of course use a digital object simply by printing out its constituent
sequences of “1”’s and “0”’s on paper and using this to decorate one’s home.
However it seems reasonable to suppose that this has little to do with the infor-mation content in the digital object – unless of course that is what it was designed
for. For example the Arecibo message [130] was designed to be understood by extra-
terrestrials. This consisted of a sequence of 1,679 bits, which if displayed as 73 rows
by 23 columns looks like Fig. 9.1 (the shading has been added on the right to make
the different parts of the image clearer).
The idea is that even with no shared cultural or linguistic roots one can rely on
basic counting, an awareness of prime numbers, elements, chemistry and physics –
which any being able to receive the message might reasonably be expected to
possess.It is not clear how many human recipients could decipher the message without
help!
167D. Giaretta, Advanced Digital Preservation, DOI 10.1007/978-3-642-16809-3_9,C Springer-Verlag Berlin Heidelberg 2011
7/31/2019 Chapter 9 - Understandability and Usability of Data
http://slidepdf.com/reader/full/chapter-9-understandability-and-usability-of-data 2/9
168 9 Understandability and Usability of Data
Fig. 9.1 Arecibo message as 1’s and 0’s (left ) and as pixels – both black and white (centre) and
with shading added (right )
9.1 Re-Use of Digital Objects – Interoperability and Preservation
One of the interesting, and indeed useful benefits of following OAIS and judging
digital preservation in terms of usability and understandability is that resources
which are needed for preservation also produce immediate benefits in terms of
wider, contemporary, use of the digital objects.
We justify this claim by noting that if one is familiar with a particular piece
of digitally encoded information then, apart from keeping the bits, one needs
nothing else. Representation Information – beyond that held in one’s mind – is
needed only where information is unfamiliar in some sense. This unfamiliarity
can arise from the passage of time – in which case we are in the realm of dig-
ital preservation. Alternatively unfamiliarity can arise from distance in discipline
7/31/2019 Chapter 9 - Understandability and Usability of Data
http://slidepdf.com/reader/full/chapter-9-understandability-and-usability-of-data 3/9
9.1 Re-Use of Digital Objects – Interoperability and Preservation 169
or experience – which can apply no matter what the difference in time – and is
necessary for usability by a wider community.
This is a very important consideration which should help to justify the expendi-
ture of those resources in preservation.
9.1.1 Relationship Between Preservation and (Re-)Use
Preservation of digitally encoded information requires that it continues to be usable
and understandable by a Designated Community. This has been extensively dis-
cussed in the previous chapters. A Designated Community is defined by the
repository (see Sect. 6.2) and this definition is vital for the testability of the effec-
tiveness of the preservation activities of the archive. However the point to realise
is that the Representation Information Network can (perhaps easily) be extended
to that needed by another Designated Community – or perhaps more precisely, to
match the Knowledge Base of some other user community, for immediate use.
In other words although the digitally encoded information is not guaranteed by
the repository to remain usable by these other users, by making the Representation
Information required to fill the knowledge gap explicit, this is much more likely
to be the case. Moreover the types of Registry/Repository(ies) of Representation
Information which are described in this book will make it much easier to share the
Representation Information required. The repository holding the data does not itself
have to fill the gap; it needs to make it clear what the end points of the Representation
Information Network it can provide are.
This is not to say that everything becomes trivial. It is instructive to look at a
number of possibilities. One can first consider a single data object – which may
of course consist of several bit sequences (for example several files). After this the
implications for combining digitally encoded information may be analysed.
9.1.2 Digital Object Used By Itself
A digital object may be used by itself, for example a user may simply want to find
a particular fact from a dataset. For the sake of concreteness let us say that (s)he
wants to determine the photon counts at a certain position in the sky from data
captured by a particular astronomical instrument, and that data is held in a FITS
file. Other examples could include determining the character or the font used at a
particular position, say “the 25th character of the second paragraph of page 51”, in
a document. These are in many ways the simplest pieces of information which one
might wish to extract from a digital object. However if one can do this then one can
build up to the extraction of more complex pieces of information, using the concepts
of virtualisation discussed in Sect. 7.8.
The Representation Information Network (RIN) (Fig. 9.2 – an annotated version
of Fig. 6.4) indicates that a Java application is available to extract the numbers from
7/31/2019 Chapter 9 - Understandability and Usability of Data
http://slidepdf.com/reader/full/chapter-9-understandability-and-usability-of-data 4/9
170 9 Understandability and Usability of Data
If we can run this then we can runthe Java software to extract the numbers
If we cannot run this then we canuse an emulator or use its RepInfoto re-create a Java VM
If we cannot run the JavaVirtual Machine then we usethis source code to re-write inanother programminglanguage such as C
If we can run this then we can usethis in a generic application toextract the numbers
If we cannot run the DDL softwarethen we can look at the DDLdefinition and write some softwareto extract the numbers
In principle we could use this, plusthe Dictionaries in order tounderstand the keywords in orderto extract the numbers
FITS FILE
FITS
DICTIONARY
FITS
STANDARD
SOFTWARE
JAVA VM
STANDARD
FITS JAVA
SOFTWARE
DICTIONARY
SPECIFICATION
XML
SPECIFICATION
UNICODE
SPECIFICATION
DDL
DESCRIPTION
DDL
DEFINITION
DDL
SOFTWARE
Fig. 9.2 Using the representation information network in the extraction of information from
digitally encoded information (FITS file)
the data. Of course this RIN will also let us know which version of Java is needed
and so forth. If the user can run the Java application then it is a simple matter to
extract the number.
Other options include:
A. if (s)he does not have the correct version of Java at hand then (s)he at least has
the option of trying to obtain it from another Registry/Repository – because (s)he
knows what is needed.a. An important variant of this is the use of emulators, described in Sect. 7.9.
B. if the Java application cannot be run then it might be possible to take the Java
source code, if available, and convert it to some programming language, say the
C programming language, from which one can create an appropriate application.
C. if neither (A) nor (B) are possible, then a data description language (DDL) such
as EAST or DRB, together with the associated data dictionary, may be used.
Again there are a number of possibilities.
a. The easiest is that a generic application such as the one described in
Sect. 7.3.5 can use the data description to extract the information needed.
b. Otherwise one might have to read the DDL description, together with the
definition of that DDL, and the associated Data Dictionary or other piece of
7/31/2019 Chapter 9 - Understandability and Usability of Data
http://slidepdf.com/reader/full/chapter-9-understandability-and-usability-of-data 5/9
9.2 Use of Existing Software 171
Semantic Representation Information, and then write an appropriate applica-
tion. This would no doubt be harder, but at least one would not have to guess
at what information the digital object holds.
Some of these options are trivial – which would be very convenient for the user.However if a trivial option is not available then at least the other options are possi-
ble – the information can be extracted with considerable certainty and used for other
purposes.
9.2 Use of Existing Software
Option (A) above is an example of using existing software – albeit probably old
software. A more interesting example is the case where one wants to use informationfrom this digital object with one’s current favourite software. This may be because of
the additional functionality which that favourite software provides. The additional
functionality could include being able to combine that data with other data more
easily. Again one can imagine that this other software may be associated with (e.g.
in the Representation Information Network of) other archived data or it may be more
modern software – the argument applies equally.
Once again one can imagine several ways of doing this and these are described
next.
9.2.1 Migration/Transformation
Migration – or more precisely Transformation (using OAIS terminology) – involves
changing the bit sequences from the original to something else. Following the recent
revision of OAIS one can recognise that if this transformation is reversible then one
can be confident that no information has been lost. On the other hand non-reversible
transformations probably have lost information and someone must take respon-
sibility to confirm that the transformation adequately maintains the “important”
information. This is discussed in much more detail in Sect. 13.6.
For those with an eye for recursion, the ways in which the trans-
formation could be carried out are special cases of this sub-section,
namely using a single digital object. For example one can use exist-
ing software, the subject of this sub-section, if there is software
which can take in the original bit sequences in order to perform the
transformation.
One could alternatively use a data description language (DDL) description to
extract values from the original and write them out as the new bit sequences. Thiscould be done using generic applications as illustrated in Fig. 9.3 or else could be
hand-crafted.
The transformation chosen will of course be one which produces something
which can be used by the software which has been chosen to deal with the
7/31/2019 Chapter 9 - Understandability and Usability of Data
http://slidepdf.com/reader/full/chapter-9-understandability-and-usability-of-data 6/9
172 9 Understandability and Usability of Data
FITS FILE
FITSDICTIONARY
FITSSTANDARD
PDFSOFTWARE
JAVA VM
PDFSTANDARD
FITS JAVASOFTWARE
DICTIONARYSPECIFICATION
XMLSPECIFICATION
UNICODESPECIFICATION
DDLDESCRIPTION
DDLDEFINITION
DDLSOFTWARE
Generictransformation
software
Original digitalobject plus data
description
OTHER DDL
DESCRIPTION
OTHER DDL
DEFINITION
OTHER
DICTIONARY
OTHER
DICTIONARY
SPECIFICATION
Transformeddigital object
Data description for
transformed digitalobject
Fig. 9.3 Using a generic application to transform from one encoding to another
information in the digitally encoded information. Authenticity evidence should
of course be provided by someone, providing values and other information
about selected Transformational Information Properties (also known as Significant
Properties), as discussed in Sect. 13.6.
9.2.2 Interfacing
A related but alternative way of using the digital object in one’s preferred software
is to use or create an appropriate programming interface. Whether or not this is pos-
sible depends upon the flexibility of that preferred software – for example whether
or not it is possible to use plug-ins.
Instead of transforming the digital object as a whole one essentially does it on the
fly, treating only the piece that is needed. The advantage is that one might be dealing
with an object of many gigabytes, perhaps, in the case of scientific information,
many terabytes (1 terabyte= 1,024 gigabytes) or even more. If one is only interested
in a small part of the information then transforming the whole digital object may be
a waste of effort. Being able to transform only the part that is needed can be a
great saving in computation time and temporary disk storage in such circumstances.
7/31/2019 Chapter 9 - Understandability and Usability of Data
http://slidepdf.com/reader/full/chapter-9-understandability-and-usability-of-data 7/9
9.4 Without Software 173
If a large number of such objects are to be dealt with, the cumulative savings could
offset the effort needed to create the programming interface.
With luck this may be done automatically; the alternative is to do it manually.
9.2.2.1 Manual Interfaces
The manual option may be described using the data shown in Sect. 19 as an exam-
ple. That data is essentially tabular. The EAST description allows one to extract
individual values. It is in principle fairly easy to implement the following Java
methods:
• public int getRowCount();
• public int getColumnCount();• public Object getValueAt(int row, int column);
in order to extend the AbstractTableModel class [71].
If this is done then many Java applications are available to manipulate or display
the data (see Sect. 7.8.2.1.2).
9.2.2.2 Automated
The automated option is the most convenient but is not often available. Essentiallythe manual steps above are carried out automatically. Whether or not this is possi-
ble depends, for example, on the amount and type of Representation Information
available and the tools which can use them.
9.3 Creation of New Software
Entirely new software may be needed in order to adequately deal with the digitally
encoded information. Techniques described in the previous sections to extract infor-
mation from the digital object are applicable here. The difference is that one needs
to design and implement the rest of the application, rather than having one already
available. Of course what the software does is dependent on one’s imagination and
the requirements.
9.4 Without Software
Software is not always needed, as illustrated by the data at the start of this section,
where one can imagine drawing each of the pixels by hand on squared graph paper.
Pencil and paper may be all that is needed – clearly this would only be practical for
small amounts of data.
7/31/2019 Chapter 9 - Understandability and Usability of Data
http://slidepdf.com/reader/full/chapter-9-understandability-and-usability-of-data 8/9
174 9 Understandability and Usability of Data
9.5 Software as the Digital Object Being Preserved
When software itself is the digital object being preserved all the above applies.
However there are some additional considerations because to do some of what
is described in the previous sections could be very complex. This is because thesoftware which “uses” a software digital object is an operating system or virtual
machine.
The options discussed above become:
A. If (s)he does not have the correct version of operating at hand then (s)he at least
has the option of trying to obtain it from a Registry/Repository of Representation
Information – because (s)he knows what is needed.
a. An important variant of this is the use of operating systems running inemulators, described in Sect. 7.9.
B. If the application cannot be run then it might be possible to take the source code,
if available, and port it to an available operating system or convert it to another
programming language.
The remaining option, of using a data description language, is not an easy one. An
example of this could be a Java application, where we could argue that Java byte
code is well described; this would require a re-implement a Java Virtual machine –
quite a daunting task.
The testbed example in Sect. 19 provides further examples of using the
Representation Information Network for software.
9.6 Digital Archaeology, Digital Forensics and Re-Use
The above starts from the assumption that Representation Information is avail-
able, as should be the case where digitally encoded information is being adequatelypreserved.
There are times when one is not in such a fortunate position, for example where
one finds some digital data but does not know much about it. In such a case one may
be able to find the format (i.e. the appropriate Structure Representation Information),
as discussed in Sect. 7.4. What will be much more difficult to do is to find the
semantics associated with it. For example one may be able to discover that a file is a
PDF. This allows one to render the contents of the file. This does not mean that one
understands or can use the information it contains – for example the rendered text
might contain a string of “1”s and “0”s, as described at the start of this chapter, or it
might be in some unknown language.
In some cases this has not been an insuperable problem – an analogy may be
drawn with the interpretation of cuneiform – but this can take a considerable amount
of time and effort. Therefore this is a method of last resort.
7/31/2019 Chapter 9 - Understandability and Usability of Data
http://slidepdf.com/reader/full/chapter-9-understandability-and-usability-of-data 9/9
9.8 Summary 175
9.7 Multiple Objects
Dealing with multiple pieces of digitally encoded information introduces more
complexity but the essential concepts have been coved therefore no more will be
said.
9.8 Summary
Although not providing all the details, it is hoped that this chapter will have provided
the reader with an understanding of how digital objects may be used and re-used
over the long-term. Examples of some of these are provided in Part II. It may not
be a trivial process but, if the right Representation Information has been collected
then at least it should be possible. It should also be clear that the formal descriptiontechniques offer the possibility of making re-use easier for the future users.