chapter 9 - understandability and usability of data

10
Chapter 9 Understandability and Usability of Data  I hear and I for get. I see and I remember . I do and I understa nd. (Confucius ) Ensuring that digitally encoded information remains usable and understandable over time is, together with authenticity, at the heart of digital preservation. The previous chapte r disc usse d some of the formal as pects of intell igibil it y . This chapte r discusses the complementary issue of usability of the data. Usable me ans “ca pabl e of use” (OED), ava ilable or convenient for use(www.dictionary.com). In design, usability is the study of the ease with which people can employ a particular tool or other human-made object in order to achieve a particular goal. In human – computer interaction and computer science, usability studies the elegance and clarity with which the interaction with a computer program or a web site is designed (Wiki pedia). Here, by usable we mean that someone is able to do something sensible with the information it contains. We recognise that this might not be easy – but at least it should be possible to carry out. One could of course use a digital object simply by printing out its constituent sequences of “1”’s and “0”’s on paper and using this to decorate one’s home. However it seems reasonable to suppose that this has little to do with the infor- mation content in the digital object – unless of course that is what it was designed for. For exampl e the Arecibo mes sage [130] was desig ned to be underst ood by extra- terrestrials. This consisted of a sequence of 1,679 bits, which if displayed as 73 rows by 23 columns looks like Fig. 9.1 (the shading has been added on the right to make the different parts of the image clearer). The idea is that even with no shared cultural or linguistic roots one can rely on basic counting, an awareness of prime numbers, elements, chemistry and physics – which any being able to receive the message might reasonably be expected to possess. It is not clear how many human recipients could decipher the message without help! 167 D. Giaretta, Advanced Digital Preservation, DOI 10.1007/978-3-642-16809-3_9,

Upload: foveros-foveridis

Post on 05-Apr-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

7/31/2019 Chapter 9 - Understandability and Usability of Data

http://slidepdf.com/reader/full/chapter-9-understandability-and-usability-of-data 1/9

Chapter 9

Understandability and Usability of Data

 I hear and I forget. I see and I remember. I do and I understand.

(Confucius)

Ensuring that digitally encoded information remains usable and understandable over

time is, together with authenticity, at the heart of digital preservation. The previous

chapter discussed some of the formal aspects of intelligibility. This chapter discusses

the complementary issue of usability of the data.

Usable means “capable of use” (OED), “available or convenient for use”

(www.dictionary.com).

In design, usability is the study of the ease with which people can employ a

particular tool or other human-made object in order to achieve a particular goal. In

human – computer interaction and computer science, usability studies the elegance

and clarity with which the interaction with a computer program or a web site is

designed (Wikipedia).

Here, by usable we mean that someone is able to do something sensible with the

information it contains. We recognise that this might not be easy – but at least it

should be possible to carry out.

One could of course use a digital object simply by printing out its constituent

sequences of “1”’s and “0”’s on paper and using this to decorate one’s home.

However it seems reasonable to suppose that this has little to do with the infor-mation content in the digital object – unless of course that is what it was designed

for. For example the Arecibo message [130] was designed to be understood by extra-

terrestrials. This consisted of a sequence of 1,679 bits, which if displayed as 73 rows

by 23 columns looks like Fig. 9.1 (the shading has been added on the right to make

the different parts of the image clearer).

The idea is that even with no shared cultural or linguistic roots one can rely on

basic counting, an awareness of prime numbers, elements, chemistry and physics –

which any being able to receive the message might reasonably be expected to

possess.It is not clear how many human recipients could decipher the message without

help!

167D. Giaretta, Advanced Digital Preservation, DOI 10.1007/978-3-642-16809-3_9,C Springer-Verlag Berlin Heidelberg 2011

7/31/2019 Chapter 9 - Understandability and Usability of Data

http://slidepdf.com/reader/full/chapter-9-understandability-and-usability-of-data 2/9

168 9 Understandability and Usability of Data

Fig. 9.1 Arecibo message as 1’s and 0’s (left ) and as pixels – both black  and white (centre) and

with shading added (right )

9.1 Re-Use of Digital Objects – Interoperability and Preservation

One of the interesting, and indeed useful benefits of following OAIS and judging

digital preservation in terms of usability and understandability is that resources

which are needed for preservation also produce immediate benefits in terms of 

wider, contemporary, use of the digital objects.

We justify this claim by noting that if one is familiar with a particular piece

of digitally encoded information then, apart from keeping the bits, one needs

nothing else. Representation Information – beyond that held in one’s mind – is

needed only where information is unfamiliar in some sense. This unfamiliarity

can arise from the passage of time – in which case we are in the realm of dig-

ital preservation. Alternatively unfamiliarity can arise from distance in discipline

7/31/2019 Chapter 9 - Understandability and Usability of Data

http://slidepdf.com/reader/full/chapter-9-understandability-and-usability-of-data 3/9

9.1 Re-Use of Digital Objects – Interoperability and Preservation 169

or experience – which can apply no matter what the difference in time – and is

necessary for usability by a wider community.

This is a very important consideration which should help to justify the expendi-

ture of those resources in preservation.

 9.1.1 Relationship Between Preservation and (Re-)Use

Preservation of digitally encoded information requires that it continues to be usable

and understandable by a Designated Community. This has been extensively dis-

cussed in the previous chapters. A Designated Community is defined by the

repository (see Sect. 6.2) and this definition is vital for the testability of the effec-

tiveness of the preservation activities of the archive. However the point to realise

is that the Representation Information Network can (perhaps easily) be extended

to that needed by another Designated Community – or perhaps more precisely, to

match the Knowledge Base of some other user community, for immediate use.

In other words although the digitally encoded information is not guaranteed by

the repository to remain usable by these other users, by making the Representation

Information required to fill the knowledge gap explicit, this is much more likely

to be the case. Moreover the types of Registry/Repository(ies) of Representation

Information which are described in this book will make it much easier to share the

Representation Information required. The repository holding the data does not itself 

have to fill the gap; it needs to make it clear what the end points of the Representation

Information Network it can provide are.

This is not to say that everything becomes trivial. It is instructive to look at a

number of possibilities. One can first consider a single data object – which may

of course consist of several bit sequences (for example several files). After this the

implications for combining digitally encoded information may be analysed.

 9.1.2 Digital Object Used By Itself 

A digital object may be used by itself, for example a user may simply want to find

a particular fact from a dataset. For the sake of concreteness let us say that (s)he

wants to determine the photon counts at a certain position in the sky from data

captured by a particular astronomical instrument, and that data is held in a FITS

file. Other examples could include determining the character or the font used at a

particular position, say “the 25th character of the second paragraph of page 51”, in

a document. These are in many ways the simplest pieces of information which one

might wish to extract from a digital object. However if one can do this then one can

build up to the extraction of more complex pieces of information, using the concepts

of virtualisation discussed in Sect. 7.8.

The Representation Information Network (RIN) (Fig. 9.2 – an annotated version

of Fig. 6.4) indicates that a Java application is available to extract the numbers from

7/31/2019 Chapter 9 - Understandability and Usability of Data

http://slidepdf.com/reader/full/chapter-9-understandability-and-usability-of-data 4/9

170 9 Understandability and Usability of Data

If we can run this then we can runthe Java software to extract the numbers

If we cannot run this then we canuse an emulator or use its RepInfoto re-create a Java VM

If we cannot run the JavaVirtual Machine then we usethis source code to re-write inanother programminglanguage such as C

If we can run this then we can usethis in a generic application toextract the numbers

If we cannot run the DDL softwarethen we can look at the DDLdefinition and write some softwareto extract the numbers

In principle we could use this, plusthe Dictionaries in order tounderstand the keywords in orderto extract the numbers

FITS FILE

FITS

DICTIONARY

FITS

STANDARD

PDF

SOFTWARE

JAVA VM

PDF

STANDARD

FITS JAVA

SOFTWARE

DICTIONARY

SPECIFICATION

XML

SPECIFICATION

UNICODE

SPECIFICATION

DDL

DESCRIPTION

DDL

DEFINITION

DDL

SOFTWARE

Fig. 9.2 Using the representation information network in the extraction of information from

digitally encoded information (FITS file)

the data. Of course this RIN will also let us know which version of Java is needed

and so forth. If the user can run the Java application then it is a simple matter to

extract the number.

Other options include:

A. if (s)he does not have the correct version of Java at hand then (s)he at least has

the option of trying to obtain it from another Registry/Repository – because (s)he

knows what is needed.a. An important variant of this is the use of emulators, described in Sect. 7.9.

B. if the Java application cannot be run then it might be possible to take the Java

source code, if available, and convert it to some programming language, say the

C programming language, from which one can create an appropriate application.

C. if neither (A) nor (B) are possible, then a data description language (DDL) such

as EAST or DRB, together with the associated data dictionary, may be used.

Again there are a number of possibilities.

a. The easiest is that a generic application such as the one described in

Sect. 7.3.5 can use the data description to extract the information needed.

b. Otherwise one might have to read the DDL description, together with the

definition of that DDL, and the associated Data Dictionary or other piece of 

7/31/2019 Chapter 9 - Understandability and Usability of Data

http://slidepdf.com/reader/full/chapter-9-understandability-and-usability-of-data 5/9

9.2 Use of Existing Software 171

Semantic Representation Information, and then write an appropriate applica-

tion. This would no doubt be harder, but at least one would not have to guess

at what information the digital object holds.

Some of these options are trivial – which would be very convenient for the user.However if a trivial option is not available then at least the other options are possi-

ble – the information can be extracted with considerable certainty and used for other

purposes.

9.2 Use of Existing Software

Option (A) above is an example of using existing software – albeit probably old

software. A more interesting example is the case where one wants to use informationfrom this digital object with one’s current favourite software. This may be because of 

the additional functionality which that favourite software provides. The additional

functionality could include being able to combine that data with other data more

easily. Again one can imagine that this other software may be associated with (e.g.

in the Representation Information Network of) other archived data or it may be more

modern software – the argument applies equally.

Once again one can imagine several ways of doing this and these are described

next.

 9.2.1 Migration/Transformation

Migration – or more precisely Transformation (using OAIS terminology) – involves

changing the bit sequences from the original to something else. Following the recent

revision of OAIS one can recognise that if this transformation is reversible then one

can be confident that no information has been lost. On the other hand non-reversible

transformations probably have lost information and someone must take respon-

sibility to confirm that the transformation adequately maintains the “important”

information. This is discussed in much more detail in Sect. 13.6.

For those with an eye for recursion, the ways in which the trans-

formation could be carried out are special cases of this sub-section,

namely using a single digital object. For example one can use exist-

ing software, the subject of this sub-section, if there is software

which can take in the original bit sequences in order to perform the

transformation.

One could alternatively use a data description language (DDL) description to

extract values from the original and write them out as the new bit sequences. Thiscould be done using generic applications as illustrated in Fig. 9.3 or else could be

hand-crafted.

The transformation chosen will of course be one which produces something

which can be used by the software which has been chosen to deal with the

7/31/2019 Chapter 9 - Understandability and Usability of Data

http://slidepdf.com/reader/full/chapter-9-understandability-and-usability-of-data 6/9

172 9 Understandability and Usability of Data

FITS FILE

FITSDICTIONARY

FITSSTANDARD

PDFSOFTWARE

JAVA VM

PDFSTANDARD

FITS JAVASOFTWARE

DICTIONARYSPECIFICATION

XMLSPECIFICATION

UNICODESPECIFICATION

DDLDESCRIPTION

DDLDEFINITION

DDLSOFTWARE

Generictransformation

software

Original digitalobject plus data

description

OTHER DDL

DESCRIPTION

OTHER DDL

DEFINITION

OTHER

DICTIONARY

OTHER

DICTIONARY

SPECIFICATION

Transformeddigital object

Data description for

transformed digitalobject

Fig. 9.3 Using a generic application to transform from one encoding to another

information in the digitally encoded information. Authenticity evidence should

of course be provided by someone, providing values and other information

about selected Transformational Information Properties (also known as Significant 

Properties), as discussed in Sect. 13.6.

 9.2.2 Interfacing

A related but alternative way of using the digital object in one’s preferred software

is to use or create an appropriate programming interface. Whether or not this is pos-

sible depends upon the flexibility of that preferred software – for example whether

or not it is possible to use plug-ins.

Instead of transforming the digital object as a whole one essentially does it on the

fly, treating only the piece that is needed. The advantage is that one might be dealing

with an object of many gigabytes, perhaps, in the case of scientific information,

many terabytes (1 terabyte= 1,024 gigabytes) or even more. If one is only interested

in a small part of the information then transforming the whole digital object may be

a waste of effort. Being able to transform only the part that is needed can be a

great saving in computation time and temporary disk storage in such circumstances.

7/31/2019 Chapter 9 - Understandability and Usability of Data

http://slidepdf.com/reader/full/chapter-9-understandability-and-usability-of-data 7/9

9.4 Without Software 173

If a large number of such objects are to be dealt with, the cumulative savings could

offset the effort needed to create the programming interface.

With luck this may be done automatically; the alternative is to do it manually.

9.2.2.1 Manual Interfaces

The manual option may be described using the data shown in Sect. 19 as an exam-

ple. That data is essentially tabular. The EAST description allows one to extract

individual values. It is in principle fairly easy to implement the following Java

methods:

• public int getRowCount();

• public int getColumnCount();• public Object getValueAt(int row, int column);

in order to extend the AbstractTableModel class [71].

If this is done then many Java applications are available to manipulate or display

the data (see Sect. 7.8.2.1.2).

9.2.2.2 Automated

The automated option is the most convenient but is not often available. Essentiallythe manual steps above are carried out automatically. Whether or not this is possi-

ble depends, for example, on the amount and type of Representation Information

available and the tools which can use them.

9.3 Creation of New Software

Entirely new software may be needed in order to adequately deal with the digitally

encoded information. Techniques described in the previous sections to extract infor-

mation from the digital object are applicable here. The difference is that one needs

to design and implement the rest of the application, rather than having one already

available. Of course what the software does is dependent on one’s imagination and

the requirements.

9.4 Without Software

Software is not always needed, as illustrated by the data at the start of this section,

where one can imagine drawing each of the pixels by hand on squared graph paper.

Pencil and paper may be all that is needed – clearly this would only be practical for

small amounts of data.

7/31/2019 Chapter 9 - Understandability and Usability of Data

http://slidepdf.com/reader/full/chapter-9-understandability-and-usability-of-data 8/9

174 9 Understandability and Usability of Data

9.5 Software as the Digital Object Being Preserved

When software itself is the digital object being preserved all the above applies.

However there are some additional considerations because to do some of what

is described in the previous sections could be very complex. This is because thesoftware which “uses” a software digital object is an operating system or virtual

machine.

The options discussed above become:

A. If (s)he does not have the correct version of operating at hand then (s)he at least

has the option of trying to obtain it from a Registry/Repository of Representation

Information – because (s)he knows what is needed.

a. An important variant of this is the use of operating systems running inemulators, described in Sect. 7.9.

B. If the application cannot be run then it might be possible to take the source code,

if available, and port it to an available operating system or convert it to another

programming language.

The remaining option, of using a data description language, is not an easy one. An

example of this could be a Java application, where we could argue that Java byte

code is well described; this would require a re-implement a Java Virtual machine –

quite a daunting task.

The testbed example in Sect. 19 provides further examples of using the

Representation Information Network for software.

9.6 Digital Archaeology, Digital Forensics and Re-Use

The above starts from the assumption that Representation Information is avail-

able, as should be the case where digitally encoded information is being adequatelypreserved.

There are times when one is not in such a fortunate position, for example where

one finds some digital data but does not know much about it. In such a case one may

be able to find the format (i.e. the appropriate Structure Representation Information),

as discussed in Sect. 7.4. What will be much more difficult to do is to find the

semantics associated with it. For example one may be able to discover that a file is a

PDF. This allows one to render the contents of the file. This does not mean that one

understands or can use the information it contains – for example the rendered text

might contain a string of “1”s and “0”s, as described at the start of this chapter, or it

might be in some unknown language.

In some cases this has not been an insuperable problem – an analogy may be

drawn with the interpretation of cuneiform – but this can take a considerable amount

of time and effort. Therefore this is a method of last resort.

7/31/2019 Chapter 9 - Understandability and Usability of Data

http://slidepdf.com/reader/full/chapter-9-understandability-and-usability-of-data 9/9

9.8 Summary 175

9.7 Multiple Objects

Dealing with multiple pieces of digitally encoded information introduces more

complexity but the essential concepts have been coved therefore no more will be

said.

9.8 Summary

Although not providing all the details, it is hoped that this chapter will have provided

the reader with an understanding of how digital objects may be used and re-used

over the long-term. Examples of some of these are provided in Part II. It may not

be a trivial process but, if the right Representation Information has been collected

then at least it should be possible. It should also be clear that the formal descriptiontechniques offer the possibility of making re-use easier for the future users.