audience and access: evaluating access tools for...

8
Audience and Access: Evaluating Access Tools for Multimedia Collections Douglas Tudhope, Daniel Cuntffe Hypennedia Research Unit University of Glamorgan Abstract Major efforts are underway to digitise cultural heritage collections for the Internet and existing collections databases, previously the domain of the professional, are being unlocked to a new public. This raises the question of how people will try to access this new form of information and how we can assess the access points and tools we provide. While not claiming to provide a solution to these issues, this paper discusses relevant literature and its implications for cultural heritage information providers. Drawing on current research at University of Glamorgan, the problem of how to evaluate novel access tools is discussed and alternative approaches compared. The paper reviews evaluation methodology from a computing perspective, illustrating issues with a case study of Web evaluation and initial thinking on a planned evaluation of a thesaurus- based retrieval prototype. Introduction Major efforts are underway to digitise cultural heritage collections for the Internet and existing collections databases, previously the domain of the professional, are being unlocked to a new wider public. This is occurring not only in the museum and gallery domain, but in digital library research. It raises the question of how people will try to access this new form of information and how we can assess the access points and tools we provide. While not claiming to provide a solution to these issues, this paper discusses relevant computing literature and its implications for cultural heritage information providers. Drawing on current research at University of Glamorgan, the problem of how to evaluate novel access tools is discussed and alternative approaches compared. Some museums have invested significant effort in evaluation and are evolving traditional visitor techniques to deal with computer- based gallery interactives and Web sites. As a contribution to this effort, the paper reviews evaluation methodology from a computing perspective, illustrating issues with a case study of Web evaluation and initial thinking on a planned evaluation of a thesaurus-based retrieval prototype. We focus on the evaluation of an implemented computer application or prototype, as opposed to studies of user need or requirements (see McCorry and Morrison 1995; Morrison 1998; and the other papers in this session of the conference). It would be convenient if rigorously following design guidelines obviated the need for evaluation. However, we are far from that situation - many computer horror stories involve a failure to evaluate a product until effectively too late in the lifecycle to make changes. Nor is there one prescribed method of evaluation. There are many different methods, each with its own set of advantages and disadvantages and suited to Figure 1: Aspects of system acceptability (from Nielsen 1993, p25) ] I Conference Proceedings 75

Upload: others

Post on 05-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Audience and Access: Evaluating Access Tools for ...network.icom.museum/fileadmin/user_upload/mini...the paper reviews evaluation methodology from a computing perspective, illustrating

Audience and Access:Evaluating Access Tools for Multimedia Collections

Douglas Tudhope,Daniel CuntffeHypennedia Research UnitUniversity of Glamorgan

Abstract

Major efforts are underway to digitisecultural heritage collections for theInternet and existing collectionsdatabases, previously the domain ofthe professional, are being unlocked toa new public. This raises the questionof how people will try to access thisnew form of information and how wecan assess the access points and toolswe provide. While not claiming toprovide a solution to these issues, thispaper discusses relevant literature andits implications for cultural heritageinformation providers. Drawing oncurrent research at University ofGlamorgan, the problem of how toevaluate novel access tools isdiscussed and alternative approachescompared. The paper reviewsevaluation methodology from acomputing perspective, illustratingissues with a case study of Web

evaluation and initial thinking on aplanned evaluation of a thesaurus-based retrieval prototype.

Introduction

Major efforts are underway to digitisecultural heritage collections for theInternet and existing collectionsdatabases, previously the domain ofthe professional, are being unlocked toa new wider public. This is occurringnot only in the museum and gallerydomain, but in digital library research.It raises the question of how peoplewill try to access this new form ofinformation and how we can assess theaccess points and tools we provide.While not claiming to provide asolution to these issues, this paperdiscusses relevant computing literatureand its implications for culturalheritage information providers.Drawing on current research atUniversity of Glamorgan, the problemof how to evaluate novel access toolsis discussed and alternative approachescompared. Some museums haveinvested significant effort in evaluation

and are evolving traditional visitortechniques to deal with computer-based gallery interactives and Websites. As a contribution to this effort,the paper reviews evaluationmethodology from a computingperspective, illustrating issues with acase study of Web evaluation andinitial thinking on a plannedevaluation of a thesaurus-basedretrieval prototype. We focus on theevaluation of an implementedcomputer application or prototype, asopposed to studies of user need orrequirements (see McCorry andMorrison 1995; Morrison 1998; andthe other papers in this session of theconference).

It would be convenient if rigorouslyfollowing design guidelines obviatedthe need for evaluation. However, weare far from that situation - manycomputer horror stories involve afailure to evaluate a product untileffectively too late in the lifecycle tomake changes. Nor is there oneprescribed method of evaluation.There are many different methods,each with its own set of advantagesand disadvantages and suited to

Figure 1: Aspects of system acceptability (from Nielsen 1993, p25)

]

IConference Proceedings 75

Page 2: Audience and Access: Evaluating Access Tools for ...network.icom.museum/fileadmin/user_upload/mini...the paper reviews evaluation methodology from a computing perspective, illustrating

Delivering Diversity; Promoting Participation

different stages of the lifecycle. fordetailed discussion, see HCI textsdealing with evaluation issues (e.g.,Nielsen 1993; Shneiderman 1998). Itis important to be clear about preciselywhat you wish to evaluate - usabilityis only one aspect of human factorsissues (see figure 1).

Evaluation can be formative - intendedto help refine a design, or summative -

a final test of the system’s suitability.Measures can collect quantitativeand/or qualitative data. Quantitativeevaluation data can include details oftime taken, errors noted, or affectivemeasures, and lends itself to statisticalprocessing. Qualitative descriptions ofuser behaviour are more difficult toanalyse but can lead to richerunderstanding of the context andinsights into reasons for mistakes andcognitive dimensions. The setting forthe evaluation can vary along acontinuum from a controlled usabilitylab equipped with one-way mirror,video recording and loggingequipment to ‘contextual’ observationsin a realistic workplace environment.The controlled setting may be suitedfor formally testing specifichypotheses and investigating causalrelationships among experimentalvariables, such as the effect ofdifferent types of menu design ortypography. A workplace setting maymore faithfully reproduce typicalsituations, such as frequentinterruption and task switching andmay sometimes yield more validitywhen attempting to generalise to userpopulations beyond the trial subjects.Methods differ as to the personnelinvolved in an evaluation andcharacteristics identified as critical byresearchers. Frequently this is subjectto very practical constraints such asavailability and expense, although the

consequences may not be apparentuntil later in the development. Aseemingly successful evaluation maybe misleading if findings aresubsequently generalised to a userpopulation differing in key aspects.Even the very definition of success inan evaluation is not always obvious.For example, the common measure ofthe number of ‘hits’ on a Web site maynot correspond in any direct manner tobusiness or organisational objectives.Notwithstanding these complications,almost any evaluation will yield someinsights and the inevitable limitationsshould not be used as an excuse topass over the issue.

Evaluation Methods

Evaluation methods are typicallysubdivided into analytical, survey,inspection, and observationalcategories. Analytical evaluation isconducted early in the lifecycleon a(semi) formal specification, basedupon a physical and cognitive modelof the operations a user will perform.Single layer models consist of a linearsequence of micro physical andcognitive ‘actions’, such as themovement of the mouse or the typingof a key. Usually the measure is thetime taken to perform some task. The‘keystroke-level’ model is a well-known example of a single layermodel. Multi-layer models (such asGOMS - Goals, Operators, Methods,Selection Rules) are hierarchical withtasks being divided into sub-tasks. Nouser testing is required with analyticalevaluation. Such methods tend tofocus on error-free performance byexpert users and while particularlyvaluable in some application areaswould typically be combined with

Scenario

another, more empirical method.Survey methods include the interviewand questionnaire and are probably themost well-known to the laypopulation. There is an extensiveliterature on questionnaire design - ausual recommendation is to conduct apilot before embarking on the mainstudy. Design varies from fixed formatto open-ended where the respondent isinvited to give their own opinions orsuggestions. Although response rate isalways an issue, survey methods canbe quite cost effective. It is importantto remember that they are designed toelicit an opinion, usually post-hoc, andthat there may sometimes besystematic reasons why respondentsare unable or unwilling to rememberthe details sought. See Shneiderman(1998) for more details and furtherreferences on analytical and surveyevaluation.

Inspection methods involve one ormore evaluators moving through aninterface and assessing it forconformance to a pre-defined set ofguidelines or heuristics. Inspectionmethods are a widely used form ofevaluation for a number of reasons:some inspection methods require lessformal training for evaluators thanother evaluation methods, they can beused throughout the developmentprocess, they do not require the use oftest users for experimentation, andthey find a large number of potentialusability problems (Sears 1997). Greyand Salzman (1998) characteriseinspection methods according to thetype of guidelines they use (level ofabstraction and perspective) andwhether or not the application ofguidelines is guided by scenarios(scenarios focus the inspection on usertasks rather than being an exhaustiveassessment):

Guidelines No Yes

None Expert review Expert walkthrough

Short list Heuristic evaluation Heuristic walkthrough

Long list Guidelines Guidelines walkthrough

Information processing perspective N/A Cognitive walkthrough

Table 1: Grey and Salzman ‘is’ characterisation of inspection methods

76 mda Information

Page 3: Audience and Access: Evaluating Access Tools for ...network.icom.museum/fileadmin/user_upload/mini...the paper reviews evaluation methodology from a computing perspective, illustrating

Audience and Access: Evaluating Access Toots for Multimedia Collections

The number of evaluators needed toapply a particular inspection methodand the expertise required by theevaluators varies. For example,Nielsen (1993, p156) suggests aboutfive and at least three independentinspections are required for aneffective heuristic evaluation. Nielsenalso recommends that the evaluatorsshould have experience in performingsuch evaluations, but also suggests thatuseful results can be achieved by non-experts. Different inspection methodsuse different numbers of guidelines,guidelines at different levels ofabstraction and guidelines fromdifferent perspectives. For heuristicevaluation, for example, Nielsenrecommends using around ten highlevel heuristics. The effective use ofhigh level, abstract heuristics, such as“Speak the users’ language” (Nielsen,1993) requires a greater element ofprofessional judgement, lessexperienced evaluators may find alarger, more detailed set of guidelinesmore appropriate. While HCI researchhas produced a number of sets of highlevel heuristics and also low levelguidelines (e.g., Smith and Mosier1986), the simple application of theseguidelines to Web sites andmultimedia products is not alwaysappropriate nor sufficient. Morespecialist forms of inspection methodhave emerged for particular domainswith faceted lists of guidelines. Forexample, the CIDOC MultimediaWorking Group (Trant 1997) havedeveloped multimedia evaluationcriteria for kioskiCDfWeb applicationsin the museum domain, with 49checkpoints arranged by categories:Content, Functionality, Interface,Implementation, Overall. In the CBLdomain, Barker and King (1993) havedeveloped an evaluation check list andsupporting notes for interactivemultimedia courseware. Typically suchlists of heuristics require subjectiveassessment on the part of theevaluator, e.g. “Is the presentationconsistent? Clear?” The scope of theguidelines is also an importantconsideration, some focus exclusivelyon usability issues, whilst othersinclude consideration of wider issuessuch as the pricing of the product andthe delivery platforms.

Recently, highly specific guidelineshave been proposed based on anunderlying model of the domain.Faraday and Sutcliffe (1997) useguidelines for timeline-basedmultimedia presentations based on acognitive model of multimediacomprehension. They focus on themicro ‘moment-by-moment’ detaileddesign decisions on the choice andsynchronisation of different mediatypes (text, image, sound, animation,etc) and highlight the importance of‘contact points’ for relating audio andvisual channels. Garzotto and Matera(1997) are developing a systematicmethod for inspecting navigationissues in hypermedia systems (SUE).In the inspection phase SUE uses a setof specific abstract tasks which codifythe inspection patterns and behavioursof experienced evaluators in a formthat can be applied by noviceevaluators. Whilst the philosophy ofthe approach is to support noviceinspections, currently SUE involves apreparatory phase requiring the use ofa formal hypermedia model, typicallynot appropriate for novice evaluators.However work is ongoing and the aimis to develop a method which yieldshigh levels of inter-evaluatorconsistency.

Whilst the fact that inspection methodsdo not require test users has benefits interms of costs and convenience, it isalso a weakness. Concerns have beenraised regarding the number of falseusability problems identified by suchmethods (Bailey et al., 1992, cited inGrey and Salzman 1998), that isproblems that real users performingreal tasks would not encounter. Thismay also in part be due to theapplication of inappropriate guidelinesor the inappropriate application ofguidelines. The selection andapplication of guidelines can usefullybe informed by consideration of usersand their tasks. Another consequenceof not involving users is that usabilityproblems resulting from usersbehaving in ways unanticipated by thedeveloper or evaluators will not beidentified.

Observational techniques can includevarious data gathering methods,ranging from simple direct observation

by the researcher (taking fieldnotes),to taped ‘think-aloud’ sessions(concurrent or retrospective) or coevaluation (two users working togetherwith their conversation recorded), tovideotape, interaction logging, andeven eye tracking. These methods aresometimes classed as ‘empirical’evaluation - some manner ofrepresentative user is observedoperating on an implemented systemor prototype. Dealing with the quantityof data gathered is often a practicalproblem for subsequent analysis andsometimes privacy concerns may bean issue. Another key issue is theextent and manner to which the usersmay be affected by the observationprocess itself (the ‘Hawthorn effect’).This can arise in different guises withthe different data gathering methods. Avery practical concern is the need tore-assure users that they are not thefocus of the evaluation but rather thesystem and that should they feeluncomfortable they may stop or askquestions at any time.

Museum Evaluation

Reports of museum evaluation studiescan be found in relevant culturalheritage publications, including mdaConference Proceedings, theInternational Laboratory of VisitorStudies Review, Archives and MuseumInformatics, Computers and theHistory of Art. Some computingliterature has addressed museumevaluation issues specifically. One ofthe first studies was Hardman’s (1989)evaluation of the early hypermediasystem, Glasgow Online, whichemployed direct observation, thinkaloud and videotaping alongside ananalysis of hypertext structure. A goodexample of a kiosk evaluation by anApple HCI team, involving transactionlogging and iterative design can befound in Salomon (1989). Garzottoand Matera’s (1997) inspectionmethod discussed above was appliedto several cultural heritage CDs. Fidel(1997) discusses image retrieval fromphotographic archives, emphasisingthe importance of user tasks andcontext on the design of performancemeasures. Shneiderman et al (1989)

Conference Proceedings 77

Page 4: Audience and Access: Evaluating Access Tools for ...network.icom.museum/fileadmin/user_upload/mini...the paper reviews evaluation methodology from a computing perspective, illustrating

Detivering Diversity; Promoting Participation

compared three museum evaluationsand (among other issues) emphasisedthe critical issue of the initialinstructions or training given to users.This point also arose in a trialevaluation we conducted of walk-upuse of an early prototype exploring useof SHIC (the Social History andIndustrial Classification) forpresentation and access purposes(Tudhope et al 1994, 1998).

Web Evaluation

The Web poses a number of challengesfor developers:

• The element of user control over thepresentation of the Web page, thepoint of entry to the Web site andthe pages visited.

• Lack of control over the usershardware and software platform.

• Lack of control over network andserver response times. Speed ofresponse has been identified as amajor usability issue for users(GVU, 1998).

• The relative importance of content,visual appearance and usability canbe both site and user specific.

• Usability must be considered at bothsite-level and page-level. Where anumber of sites form a collaborativenetwork, inter-site usability mustalso be considered.

• Although the current culturalorientation of the Web is towards theUSA, there is a growing realisationof the need to consider the potentialworld wide audience and embrace awide variety of cultural conventions,languages, and so on.

• Whilst it is generally accepted thatthe two main modes of userinteraction are directed searchingand exploratory browsing, usersactually exhibit a range ofbehaviours which must be supported(Marchionini 1995; Smith et al.1997).

• The need for definitions of sitesuccess which are more satisfactorythan simple hit counts.

• The wide range and variable qualityof guidance available, ignorance ofexisting research work, inappropriateapplication of research work, and thedifficulty in applying theoreticalresearch to practical developments.

• In large Web sites it may not bepractical to evaluate each individualpage due to resource constraints. Afocus on user scenarios or on keypages can reduce the number ofindividual pages that must beevaluated.

A recent snapshot of HCI research onWeb usability can be found in Shumand McKnight (1997), for exampleShneiderman (1997). Bevan (1998)lists a good set of practical Web-specific guidelines while Smith (1996)provides a more theoreticalperspective on hypertext evaluationand discussion of an experimentmeasuring ‘lostness’ in hyperspace. Toillustrate practical issues involved inWeb evaluation, we now consider arecent ‘low-budget’ evaluationconducted at Glamorgan.

Case Study

The Web site developed providesinformation about The New Review ofMultimedia and Hypermedia, anannual review journal (NRHM 1999).It is a relatively small site, consistingof approximately one hundred andsixty pages, mainly paper abstracts.No formal development model wasused though Bevan’s paper (Bevan1998) was used as an informal guide.The development of the site wascompleted without any formal userinput, relying on the intuitions of theinformation providers and competitiveanalysis of existing Web sites toprovide user needs models.

The summative evaluation of the siteemployed a combination of methods,selected according to the followingconsiderations:

• The evaluation had to be completedin approximately two weeks.

• The intended users of the Web sitewere spread across the globe.

• The perceived importance of testingcommon user tasks.

• Reported methods used in similarevaluations.

• The reported advantages anddisadvantages of the methods.

• The evaluation team consisted ofone inexperienced person, and therewas no access to specialist HCItesting facilities or equipment.

Based on these considerations, fourtechniques were selected: directobservation of usability tests; heuristicevaluation; on-line questionnaire; andtransaction log analysis.

Direct Observation

Within the constraints of theevaluation, the gathering of a testgroup of real users provedproblematic. In order to perform anevaluation with the proxy users thatwere available, a set of six taskscenarios was constructed, each taskscenario presented a specific task(Kritou 1998, p37):

Assume that you are an author whowants to submit a paper and looks foran appropriate journal. You havefound the NRHM Web-Site, find outwhat topics can be submitted in thenext issue and the guidelines forsubmitting a paper to this journal.

Evaluations were performed in thesubjects offices to provide a naturalcontext. During the evaluation subjectswere encouraged to ‘think aloud’, avideo camera was used to record theusers utterances and on-screenbehaviour. While the ‘think aloud’protocol helps in understanding whyusers behave the way they do, most ofthe subjects reported that they wereuncomfortable with it and that itinterfered with their execution of thetasks. The evaluator also completed anobservers notebook during theevaluation. This was used to record thetime taken for each task, whether thetask was completed, whether thesubject followed the optimum path(the path involving the fewest ‘clicks’)and the subjects affective state. Inpractice the evaluator found it difficult

78 mda Information

Page 5: Audience and Access: Evaluating Access Tools for ...network.icom.museum/fileadmin/user_upload/mini...the paper reviews evaluation methodology from a computing perspective, illustrating

Audience and Access: Evaluating Access Tools for Multimedia Collections

to record all the information requiredand some of the details gathered wereof little practical worth. The evaluationwas followed by a short interviewduring which the subjects wereencouraged to comment generally ontheir experience and to express theirsubjective satisfaction.

Heuristic Evaluation

The evaluator reviewed a number ofwell-established principles of usabilityand Web site design in order to derivea set of eight high level heuristics. Theevaluator then performed a page bypage inspection of the entire site. Eachheuristic was described by a heading(shown below) and a short paragraphhighlighting key concerns.

• Consistency and conformance tostandards.

• Recognition and predictability.

• Web pages should stand alone.

• Flexibility and efficiency of use.

• Effectiveness.

• Readability.

• Every page should express onetopic or concept.

• Consider the global audience.

Log Analysis

The evaluation collected Web accesslogs over a period of sixteen days. Inaddition to simple ‘hits per page’information the evaluator wasinterested in trying to identifyparticular patterns of use. Theevaluator encountered severaldifficulties with log analysis:

• P addresses are not uniqueidentifiers for users, thereforeidentifying users and tracking theirbehaviour over several interactivesessions is problematical.

• The log of accesses may not containa full record of interactions if pagesare cached.

• It is difficult to identify specificusability problems from the analysisof Web logs alone, and this analysismay be highly subjective.

The importance of capturing real userinteractions should be stressed eventhough we may lack effective tools forgathering and analysing it. Onetechnique which may prove effectivefor sites with search facilities is thecapture and analysis of search terms(Bevan 1998; Rosenfeld and Morville1998, p173).

On-line Questionnaire

An on-line questionnaire was createdcontaining questions to gather threetypes of information: demographicinformation, technical information,and visit information. Thequestionnaire was linked toprominently from the NRHMhomepage and e-mail was sent tovarious mailing lists and individuals.

Demographic information includedoccupation, age, locality and Internetexperience. Technical informationincluded the type of browser used andthe speed of their Internet connection.As the intended audience wereexpected to be technically literate, thistype of question was appropriate.Where the users are technically naivethey may not know the answers tothem potentially leading to non-completion of the questionnaire,guessed answers, or missing data. Visitinformation included the number oftimes the user had visited the site, theirpurpose in visiting the site, the pagethey entered the site on, pages theyvisited, how useful they found thepages they visited and which pagesthey bookmarked. These questionswere intended to build up a moredetailed picture of user behaviour.Users were also asked to rate theirgeneral satisfaction with the site andwere offered the opportunity make anygeneral comments.

The two major concerns with on-linequestionnaires and similar feedbackmechanisms are the self selectingnature of the sample and the responserate required to draw reliableconclusions. The response rate for thequestionnaire was between 5.9% and2.3% depending on how the number ofvisitors is determined. Although thiscompares reasonably to the 2.0% rate

reported in the evaluation of theScience Museum Web site (Thomasand Paterson, 1998), the actual numberof responses was low and it wasimpossible to draw reliableconclusions from the results.

Comparison

A total of twenty-two potentialusability problems were identified bythe methods used (Table 2). Tenproblems were identified solely byheuristic evaluation, nine wereidentified solely by direct observation,one was identified by heuristicevaluation and direct observation, onewas identified by heuristic evaluation,direct observation and log analysis,and one was identified by all fourmethods.

The usability problem identified by theonline questionnaire was contained asa general comment rather than inanswer to a specific question. The twousability problems identified by loganalysis, relied heavily on theevaluators subjective interpretation.These methods are best suited tocapturing information about users andtheir behaviour, rather than identifyingspecific usability problems.Questionnaires can also be useful fordetermining if a site is meeting thegeneral needs of its users. Thepotential usability problems identifiedby direct observation include someprobable false positives restilting fromusing test subjects who were proxyusers. For instance ‘The differencebetween Editors and Editorial Board isnot explained’ is unlikely to beexperienced by real users. Where realusers are observed, such false positivesshould not occur, providing the usersare representative of the userpopulation as a whole. The heuristicevaluation also identified potentialfalse positives, such as ‘There is noHelp facility’. The direct observationdid not identify the need for a helpfacility (though this could be due tothe use of proxy users). This reflectsthe difficulty in selecting and applyingappropriate heuristics. Heuristicevaluation identified the largestnumber of potential usability

Conference Proceedings 79

Page 6: Audience and Access: Evaluating Access Tools for ...network.icom.museum/fileadmin/user_upload/mini...the paper reviews evaluation methodology from a computing perspective, illustrating

Delivering Diversity; Promoting Participation

Description of the problem HE DO LA OQ

Search facilities are not efficient enough X

Index is not consistent across the Web site X X

‘Return to top’ is not used consistently X

There are four brelcen links in the Website

There is no l{elp facility

Search facilities not visibleinstantly/Grouping of links not effective

Instructions to Authors link not available Knext to each theme for submission

A Volume page in Hypermedia Journal xdoesn’t link back to the Volume

Pages don’t provide the creator, date of: ?fcreation, update and copyright

The title (in <TITLE> ta is not always xrepresentative of the Web site

In a small screen the Home Page is toolong

As the Web site gets larger the Index will Kbe very long

The layout is too simple and not inviting

Hypermedia Journal not visiblu in Volume xContents in a small screen

The distinction between the two journals xis not emphasised enough

The author’s address is not available when ?C’clicking on his name

The difference between Editors and xEditorial Board is not explained

Subscription information is insufficient

The list of papers under an authors nameis not numbered

The purpose of the Web site is not stated

The ‘no abstract available’ message causes xconfusion

There are no instructions an how to geta xfull paper

Tabte 2: NRHM Comparison of methods(HE = Heuristic Evaluation, DO = Direct Observation,LA = Log Analysis, OQ = Online Questionnaire)

problems, but some of these, such as‘The no abstract available messagecauses confusion’ were only foundbecause the inspection wascomprehensive. There are only a verysmall number of papers which have noabstract, so if a scenario basedinspection had been performed it isunlikely that this would have beendiscovered.

The purpose in identifying usabilityproblems is that they can then berectified. However in many cases itmay not be cost effective to rectify allthe problems that have been identified.In order to make an informed choice,some form of severity ranking isnecessary (Nielsen 1999). Often thisinvolves ranking by, and agreementbetween expert evaluators. Methodsfor placing this on a systematicfooting so that non-expert evaluatorscan perform this activity effectivelyhave yet to be developed.

This case study used a variety ofcomplementary evaluation methodsand suggests that a combination ofmethods is effective in assessingusability from a number ofperspectives. Direct observation andinspection methods proved useful foridentifying specific usabilityproblems. The questionnaire and loganalysis provided useful informationon actual user behaviour rather thanbehaviour under evaluation conditions,and could be useful for trackingchanges in user behaviour over time.Whilst each of the methods used canpotentially produce usefulinformation, it is important torecognise that each method also haslimitations. The use of aninappropriate method, or of a methodunder inappropriate conditions orassumptions is likely to result inmisleading or incomplete results.

Future evaluation studiesat Glamorgan

We are currently considering thedesign of future evaluations ofresearch prototypes of thesaurus-basedretrieval systems. The University ofGlamorgan Hypermedia Research Unit

80 mda infonnation

Page 7: Audience and Access: Evaluating Access Tools for ...network.icom.museum/fileadmin/user_upload/mini...the paper reviews evaluation methodology from a computing perspective, illustrating

Audience and Access: Evaluating Access Tools for Multimedia Collections

has a longstanding interest in thequestion of how the semantic structureunderlying information can be used toenhance browsing and search tools,particularly in the cultural heritagedomain. A classification system, orthesaurus, embodies a semanticnetwork of relationships betweenterms. Thus it has some inherentnotion of distance between terms, their‘semantic closeness’. Distancemeasurements between terms can beexploited to provide more advancednavigation tools than relying on fixed,embedded links. The algorithm isbased on a traversal function over theunderlying semantic net (Tudhope andTaylor 1997), a function of the steps tomove from one term in the indexspace to another term - exact match ofterms is not required. Each traversaldiminishes the semantic closeness by acost factor which varies with the typeof semantic relationship connectingthe two terms. Previous researchemployed a small testbed, largelyarchival photographs from thePontypridd Historical and CulturalCentre, indexed by the Social Historyand Industrial Classification (SHIC1995). Advanced navigation optionsincluded query expansion when aquery fails to return results andrequesting information similar to thecurrent item (Cunliffe et al. 1997).

One current research project,emphasising spatial access to culturalheritage collections, investigates thecombination of various distancemeasures to augment the retrievalcapabilities of online gazetteers orgeographical thesauri (for example, theTGN - Getty Thesaurus of GeographicNames). Another EPSRC fundedcollaborative project with the NationalMuseum of Science and Industry(NMSI) will make use of the Getty Artand Architecture Thesaurus (AAT) asthe terminology system. We arecurrently considering evaluationstrategies for these projects. Asdiscussed above, previous museumsetting evaluation of prototypeinformation exploration tools raisedissues to consider before invitingmembers of the general public to trialany system (Tudhope et al. 1994). Itproved difficult to separate access andnavigation issues from general user

interface issues in the evaluation of theprototype, which had not involvedsurface details of the interface as aprime concern. Based on thisexperience, we intend to distinguishevaluation and refinement of thedistance measures from empirical,operational studies of online use withrepresentative users. Evaluation, morenarrowly focused on the distancemeasures will need to occur beforemore contextual evaluations of actualuse which will include considerationof broader HCI issues andvisualisation of results. The initialevaluation of distance measures isbetter suited to expert users in thesubject domain with some knowledgeof the use of controlled vocabularies inindexing and retrieval. Key issuesinclude tuning of controllingparameters for the semantic distancemeasures and feedback on applicationscenarios.

The basic assumption that there is acognitive basis for a semantic distanceeffect over terms in thesauri has beeninvestigated by Brooks (1995) in aseries of experiments exploring therelevance relationships betweenbibliographic records and topicalsubject descriptors. Subjects areessentially asked to assess thesimilarity of two texts (the record andthe descriptor). Analysis found asemantic distance effect, with aninverse correlation between semanticdistance and relevance assessment,modified by various factors. Rada hasalso conducted experiments onsemantic distance measures using theMEdical Subject Headings (MESH)thesaurus and the Excerpta Medica(EMTREE). In these experiments,expert users (physicians familiar withinformation retrieval systems) wereasked to judge the semantic closenessof the same set of citations and thequery (Rada and Barlow 1991).Results were ranked and the experts’evaluation compared with differentversions of the algorithm. Theseexperiments suggest a starting pointfor initial evaluation of measures ofsemantic distance measures. Expertsubjects will be given an initialinformation need, expressed as a set ofthesaurus terms and asked to comparethat with a sample result set of

information items and their indexterms. Subjects will be asked to rank(or score) items and then compare thatwith retrieval tool results. Providinginformation seeking scenarios willprobably be necessary to assess theuse of semantic distance measures inretrieval in a realistic manner.

References

Barker P., King T. 1993. Evaluatinginteractive multimedia courseware - amethodology. Computers inEducation, 21(4), 307-319.

Bevan N. 1998. Usability issues inweb site design (version 3, April 98),http:llwww.usability.serco.comlnetscape/index.html, also in Proceedings UPA‘98.

Brooks T. 1995. People, Words, andPerceptions: A PhenomenologicalInvestigation of Textuality. Journal ofthe American Society for InformationScience, 46(2), 103-115.

Cunliffe D., Taylor C., Tudhope D.1997. Query-based navigation insemantically indexed hypermedia.Proceedings 8th ACM Conference onHypertext (Hypertext’97), 87-95.

Faraday P., Sutcliffe A. 1997.Evaluating multimedia presentations.New Review ofHypermedia andMultimedia, 3, 7-37.

Fidel R. 1997. The image retrievaltask: implications for the design andevaluation of image databases. NewReview ofHypennedia andMultimedia, 3, 181-199.

Garzotto F., Matera M. 1997. Asystematic method for hypermediausability evaluation. New Review ofHypermedia and Multimedia, 3, 39-65.

Gray W., Salzman M. 1998. Damagedmerchandise? A review of experimentsthat compare usability evaluationmethods. Human-ComputerInteraction, 13, 203-261.

Conference Proceedings 81

Page 8: Audience and Access: Evaluating Access Tools for ...network.icom.museum/fileadmin/user_upload/mini...the paper reviews evaluation methodology from a computing perspective, illustrating

Detivering Diversity; Promoting Participation

GVU 1.998. GVU’s 10th WWW UserSurvey.http://www.cc.gatech.edu/gvu/user_surveys/survey- 199 8-10/

Hardman L 1989. Evaluating theusability of the Glasgow Onlinehypertext. Hypennedia, 1(1), 34-63.

Kritou E. 1998. A case study of webusability evaluation. MSc thesis,School of Computing, University ofGlamorgan.

Marchionini G. 1995. Informationseeking in electronic environments.Cambridge University Press.

McCorry H., Morrison I. 1995. TheCatechism Project. National Museumsof Scotland: Edinburgh.

Morrison I. 1998. SCRAN and itsusers. New Review of Hypennedia andMultimedia, 4, 255-260.

Nielsen 3. 1993. UsabilityEngineering. Academic Press: Boston.

Sears A. 1997. Heuristicwalkthroughs: finding the problemswithout the noise. InternationalJournal of Human-ComputerInteraction, 9 (3), 213-234.

SHIC 1993. Social History andIndustrial Classification: A subjectclasstfication for museum collections.Museum Documentation Association,Cambridge, UK.

Shneiderman 3. 1997. Designinginformation-abundant web sites: Issuesand recommendations. InternationalJournal of Human-Computer Studies,47, 5-29.

Shneiderman B. 1998. Designing theuser interface. AddisonWesley:Reading, Mass.

Shneiderman B., Brethauer D.,Plaisant C., Potter R. 1989. EvaluatingThree Museum Installations of aHypertext System. Journal of theAmerican Society for InformationScience 40(3), 172-182.

Trant 1. 1997. Multimedia evaluationcriteria: revised draft. ICOM CIDOCMultimedia Working Group.http://www.cidoc.icom.org/

Tudhope D., Beynon-Davies P., TaylorC., Jones C. 1994. Virtual architecturebased on a binary relational model: Amuseum hypermedia application,Hypennedia, 6(3), 174-192.

Tudhope D., Taylor C. 1997.Navigation via Similarity: automaticlinking based on semantic closeness.Information Processing andManagement, 33(2), 233-242.

Tudhope D., Taylor C. 1998.Terminology as a search aid: UsingSHIC for public access. mdaInformation, 3(1), 79-84. MuseumDocumentation Association, StationRoad, Cambridge.

Nielsen J. 1999. Severity ratings inheuristic evaluation..http://www.useit.com/papers/heuristic/severityrating.htnfl

NRHM 1999.http://www.comp.glam.ac.uk/—NRHIVI/

Pejtersen A. 1989. The Book House:Modelling user’s needs and searchstrategies as a basis for system design.Riso-M-2794 Technical Report, RisoNational Laboratory, Denmark.

Rada R., Barlow J. 1991. Documentranking using an enriched thesaurus.Journal of Documentation, 47(3), 240-253.

Rosenfeld L., Morville P. 1998.Information architecture for the WorldWide Web, O’Reilly.

Salomon G. 1990. Designing casual-use hypertext: the CHI-89 InfoBooth.Proceedings ACM Conference onComputer-Human Interaction.

Shum S., McKnight C. (eds.) 1997.Special issue on World Wide Webusability. International Journal ofComputer Studies, 47.

Smith P. 1996. Towards a practicalmeasure of hypertext usability.Interacting with Computers, 8(4), 365-381.

Smith S., Mosier 1. 1986. GuidelinesFor Designing User InterfaceSoftware, Mitre Corporation ReportMTR-9420, Mitre Corporation.

Smith P., Newman I., Parks, L. 1997,Virtual hierarchies and virtualnetworks: Some lessons fromhypermedia usability research appliedto the World Wide Web. InternationalJournal of Human- Computer Studies,47, 67-95.

Thomas N., Paterson I. 1998, ScienceMuseum Web site assessment.Research Report by Solomon BusinessResearch, http://www.nmsi.ac.uk/eval/

82 mda Information