evaluation

Evaluation

a. Why / when

b. Evaluation representations and techniques1. User based2. (expert-)Knowledge-based 3. Analytisch 4. Norms and standards5. Technisch

c. Samenvatting

a. Waarom evalueren en testen?

Usability volgens ISO 9241-11• Effectiveness – does it work for prospective users?• Efficiency – how much (time, effort) does it cost them? • Satisfaction – their subjective reaction

Evaluatie verbetert het ontwerp• User-centered: is deze web site nuttig en bruikbaar voor

bedoelde gebruikers?• Goedkoopste manier fouten te repareren:

hoe eerder hoe beter• Gebruikers en klanten betrekken bevordert acceptatie van het

product

Waarom vroeg evalueren en testen?

Source: Hawksmere - ISO seminar material

$ 1,000 $ 6,000 $ 60,000

Analysis & Design Implementation Maintenancefasen

kosten

• Kosten van het verbeteren van fouten:

Wanneer evalueren?Analysis Construction Transition

User InvolvementExpert Involvement

Maintenance

Expert ReviewSurveys

Focus Group Sessions

Active Usability Testing

Intermediate Usability Testing

Remote Usability Testing

Discovery Elaboration

Concept Testing

Continuous Usability Evaluation

Target Group Analysis

Wanneer evalueren?• Vroeg in ontwerpproces:

– Conceptueel (doel, taken, soort gebruiker, concept web site, etc.)

– Nog geen website-specifieke taken• Later:

– Specifieke taken zijn bekend, dus kunnen getest worden

– Te laat voor conceptuele fouten

b. Evaluation representations and techniques

Evaluation is based on representations (models of the system):

Formal representations - to be used by design team• CCT, ETAG, GOMS, NUAN, ….

Representations for users, client, and expert colleagues• scenario• simulation and mock-up• interactive prototype

Evaluation in design phases

Scenario and simulation: claims analysis

prototype: cognitive walk-through

prototype and implemented system:• heuristic evaluation• objective observation (usability lab)• subjective usability evaluation• mental representation and activity (hermeneutic techniques)

implemented system: standards (ISO), performance measures

Types of evaluation techniques

1. User-based (gebruiker)2. Knowledge-based (ervaring en kennis)3. Analytisch (statistische gegevens)4. Norms and standards5. Technisch (code, implementatie) – hier

niet uitgewerkt (“engineering expertise

1. User-based

User-centered design: gebruiker betrekken in ontwerp

Op verschillende manieren:• Interview (individueel)• Focus groep (8-10 deelnemers)• Observatie (individueel)

1. User-basedWat evalueer je:• Tussenliggende resultaten

– Informatie Architectuur (card sorting bijv.)– Wireframes– Grafisch ontwerp– Screenshots– Etc.

• Prototype:– Papier– Interactieve mockup (bv. clickable powerpoint)

• Werkende web site

1. User-based, voorbeeld: focus groep

1. User-based: observatieSoorten observaties:• Opdrachten met vooraf gekozen taken (in usability

laboratorium):+/- Gecontrolleerde omgeving+ Specifieke procedure+ Makkelijk vast te leggen- Gebruiker voelt zich ‘bekeken’- Gebruikers geven minder snel op

• “Normaal” gebruik (field study)

AV

One-way mirror

Observation Room Study Room

Mobiledevices

Sound-proof walls

Video camera mounted on ceiling

Dual display

DigiTV

Een typisch Usability Lab

De observatie ruimte

De gebruikers-ruimte

1. User-based observatieAan hand van voor de gebruiker typische taken (ref.

scenario’s en flowcharts)• Kwalitatief: wat voor problemen komt de gebruiker

tegen? Verder mening, op- en aanmerkingen.– Is de taak uitvoerbaar? Hoe lang doet de gebruiker er over?– Als het niet in 1x goed gaat, waar gaat de gebruiker dan zoeken?– Welke woorden begrijpt de gebruiker niet?– Welke elementen vallen direct op en welke niet?– Waar klikt de gebruiker op?– Hoe wordt de scroll-balk gebruikt?

• Kwantitatief: usability metrics per taak (tijd, aantal fouten, aantal stappen, aantal taken, etc.)

1. User-based: veldstudieSoorten observaties:• Opdrachten met vooraf gekozen taken (in usability laboratorium):

+/- Gecontrolleerde omgeving+ Specifieke procedure+ Makkelijk vast te leggen- Gebruiker voelt zich ‘bekeken’- Gebruikers geven minder snel op

• “Normaal” gebruik (field study)+ natuurlijke setting en natuurlijke motivatie+/- Met onvoorziene gebeurtenissen+ Vrijer verloop- Moeilijker op te nemen- Weinig ruimte voor observators

Test taken• Taken, dus geen functionaliteiten

– GOED: “Waar kun je het nieuwe boek over Harry Potter kopen?”

– FOUT: “Zoek in de sectie wetgeving naar de voorwaarden voor huursubsidie in het woningreglement”

• Vraag, geen opdracht– Vb.(website): “Hoeveel kost dit product?” Niet: “Vind de

productinformatie”– Geef gebruiker vrijheid om taak uit te voeren.

• Taken moeten realistisch en typisch zijn (ref. scenario’s)• Taken moeten het product redelijk ‘dekken’

– Verschillende aspecten / onderdelen / functionaliteit– Doorgaans 10 – 15 taken (45 minuten)

2. Knowledge-based evaluatie

• Op basis van kennis en ervaring van ontwerpers

• Cognitive walkthrough• Heuristische evaluaties• Checklists

Expert evaluation:Cognitive walkthrough

Definition:“finding usability problems in a user interfaceby having a small set of evaluators examine the interfaceand give an opinion for each step in the dialoguefor a selected set of scenarios”

Evaluators: user interface specialists, not from the design team

Cognitive walkthrough

Specify scenarios for possible problematic interactions, at the level of single user and system actions

Ask the evaluator to answer a small set of standard questions for each step

Example question set:• what would a normal user do in this situation?• why (based on what information or knowledge)?• what would the user expect the system to do next?

Cognitive walkthrough

Problems:• not possible to consider all possible scenarios• no information on recovery of errors• time aspect is not considered

Benefits:• very early indications of problems of representation of

information and of consistency

Cognitieve walkthrough

• Systematische methode voor het doorlopen van de site

• Voer typische taak uit op site (of prototype) en kijk of alle bijbehorende stappen door een “gemiddelde” gebruiker zouden kunnen worden uitgevoerd.

• Kan worden uitgevoerd door 1 persoon (ontwerper)


• Bestaat uit een aantal stappen:1.Definieer de doelgroep voor de test2.Creëer realistische scenario's3.Doorloop de scenario’s met ‘de 4 vragen’4.Analyseer elk scenario en geef ontwerp

verbeteringen

• Stap 1 en 2 zijn al gedaan in de taakanalyse


De vier vragen om elke stap van de scenario’s te analyseren:

• Wat wil de gebruiker in deze situatie als volgende stap bereiken?

• Wat denkt de gebruiker dat hij nu moet doen?• Waarom denkt de gebruiker dat dit de goede actie

is?• Welke systeem reactie verwacht de gebruiker?

Heuristische evaluatie

• Heuristiek = vuistregel.• Garanderen in de meeste gevallen basis usability• Aan de hand van bepaalde aspecten en principes:

– Bv: functionaliteit, dialoog, representatie, …• Kan worden gedaan door een usability specialist• Kan worden gedaan met een groep

– Meerdere mensen zorgen voor aanvullende inzichten

Heuristic Evaluation (Nielsen)• Visibility of system status

– The system should always keep users informed about what is going on, through appropriate feedback within reasonable time.

• Match between system and the real world– The system should speak the users' language, with words, phrases

and concepts familiar to the user, rather than system-oriented terms. Follow real-world conventions, making information appear in a natural and logical order.

• User control and freedom– Users often choose system functions by mistake and need a clearly

marked "emergency exit" to leave unwanted states without having to go through an extended dialogue.

Heuristic Evaluation• Consistency and standards

– Users must not wonder whether different words, situations, or actions mean the same thing.

• Error prevention– Even better than good error messages is a careful design which

prevents a problem from occurring in the first place.• Recognition rather than recall

– Make objects, actions, and options visible. The user should not have to remember information from one part of the dialogue to another. Instructions for use of the system should be visible or easily retrievable whenever appropriate.

• Flexibility and efficiency of use– Accelerators - unseen by novices - may speed up interaction for

experts so that systems can cater to both inexperienced and experienced users. Let users tailor frequent actions.

Heuristic Evaluation• Help users recognise, diagnose, and recover from errors

– Express error messages in plain language (no codes), precisely indicate the problem, and constructively suggest a solution.

• Help and documentation– Even though systems are best used without

documentation, it may be necessary to provide help. This should not be too large, be easy to search, focused on user tasks, listing concrete steps to be carried out.

• Aesthetic and minimalist design– Dialogues should not contain information that is irrelevant

or rarely needed. Every extra unit of information in a dialogue competes with relevant units of information and diminishes relative visibility.

Expert evaluation: Heuristic evaluation checklist Roe & Arnold

• Checklist of recommendations on human-computer interface design

Roe, 1985 & Arnold, 1988, (Revised) based on Hacker's and Rasmussen's action theoreticalframework (see Arnold's theoretical paper for ISO/TC159/SC4/WG5/SG1).

Adapt to the user

(1) Adapt as much as possible to the user's knowledge of language, workprocedures, computer operation, etc. Use the user's vocabulary. Accept different names for a command. Accept lower-case characters if capitals are asked. Use polite language. Follow common rules of syntaxis. Let the user dominate the interaction. Assimilate to the user's mode) of actions and operations. Offer several levels of dialogue. Make response time in similar situations approximately equal. Use maximum response times for various operations. If response time exceeds 15 seconds, state a reason and estimated waiting

time.

Heuristic evaluation

Provide means for planning

(2) Supply means for action preparation, especially for orientation andaction program design. Give an overview of the task situation (including defaults). Enable the user to develop a record of a personal action program. Be aware of relations between questions. Indicate possible answers in case of closed questions. Indicate the size of input fields. Present optional parameters on request. Protect the user against commands with far reaching consequences. Indicate possible answers in case of closed questions. Indicate the size of input fields. Present optional parameters on request. Protect the user against commands with far reaching consequences. Make errors caused by system limitations self-explanatory

• Make errors caused by system limitations self-exploratory


Provide uninterrupted interaction

(3) Contribute to an uninterrupted, swift execution of action programs bygiving signals, and feedback on the course and result of activities. Indicate that input is expected (prompts). Echo each key-pressing. Always present the result of data modification. Indicate that a command has been accepted. Indicate that a command has been executed. Give information on the results of an executed command, when asked for. Do not change the condition of the system unless a request has been made. Give error messages timely, but do not interrupt the action unnecessarily. Let error messages indicate what the user should do. Ask for re-entry of incorrect data only. Provide for flexible data validation.


Allow modifications

(4) Leave room for modification of action programs and their manner of execution. Make each command reversible. Offer simple means for correcting entered data, before as well as after execution of a

command. Offer "short circuiting" options to experienced users. Enable the user to enter data in advance. Give an escape-option for any condition. Use ".", "Help", and/or a special function key


Provide supervision

(5) Supply means for supervision of action execution, including anticipation to futureoperations or actions. Present an overview of the interaction history. Present an overview of coming steps of interaction. Make the dialogue (partly) reversible)


Support user limitations

(6) Take into account the capabilities and limitations of sensory, cognitive and motormechanisms. Let login procedures start automatically. Do not use abbreviations on the screen. Do not use abbreviations or codes in error-messages. Avoid giving too much information. Provide a consistent and uniform display lay-out.- Use standard location for system messages. Avoid shifts of information in case of insertions. Adopt a standard structure for all commands. Pay extra attention to frequently used commands Use keywords as arguments Use a few commands with many parameters. Do not use more than 7 items in menu selection. Do not make key-words longer than 7 characters. Use consistent names. Avoid too high requirements on input precision. Limit the amount of keying activity. Minimize the control and positioning operations. Put the cursor on the place where input is wanted. Do not ask zero's at the beginning of a number.


Support user optimization and changes

(7) Take into account and try to support user's tendency towards optimization of actionefficiency; allow changes of regulation level. Offer opportunities for condensed input. Offer room for self-defined functions. Never ask for redundant data. Let the user control the representations of information. Use informative and factual error-messages. Help to recognize user errors. Help to recognize software errors. Prompt the user to improve the efficiency of repeatedly performed operations. Give feedback on a sub-optimal sequence of operations. Encourage experimenting.


Contribute to workload balance

(8) Contribute to the maintenance of a workload balance. Watch user response times and errors. Offer suggestions to interrupt the interaction, or to postpone certain parts, when needed. Offer distracting messages, or extra tasks, when needed.


Support more than one task at a time

(9) Take into account and try to support the user's tendency to be engaged in more than onetask at the same time. Make screen presentation of the various ongoing activities possible. Make screen presentations of ongoing activities distinctive but keep the dialogue

consistent. Support swift switching between different activities. Give an overview of ongoing activities on request. Give reminders concerning non-closed activities timely.

3. Analytische evaluatieKwantitatief, gebaseerd op cijfers

• Questionnaires:– Naar zoveel mogelijk mensen opsturen– Subjectief!!

• Hit logs:– Uitgebreide site-meter– Page hits + transfer rates. Welke pagina’s worden het meest

bezocht en vanuit waar gaat men waarheen?– Interpretatie is speculatief

Subjective evaluation techniques

Not less reliable than objective techniques

examples:• SUMI software usability• SMEQ mental effortESPRIT MUSIC project

• ISA mental load instantaneous self assessment

SUMI (licence needed)http://www.megataq.mcg.gla.ac.uk/sumi.html

50 statements on software system• 5 sub-scales• for experienced users, in standard working conditions• diagnosis of usability problems requires at least 10 users

sub-scales:• Efficiency; • Affect; • Helpfulness; • Control; • Learnabilityglobal score: perceived usability

SUMI

Scoring through “stencils”

standard scores, based on large samples of industrial product evaluation

Reliable interpretation requires a sample of at least 10 users who “know” the product in normal context of use.

diagnosed:• < 40 - action needed• > 55 - acceptable software• > 60 - good software

for individual users or individual questions, see manual

System Usability Scale (SUS) – Measuring website usability:Digital Equipment Corporation, 1986

John Brooke: [email protected]

A quick and valid tool,based on ISO 9241-11 and European Community ESPRIT

project “MUSiC”

Analytische evaluatie: SUS

mailto:[email protected]

SUS

Originally aiming at “software systems”

To be used after users got to “know” the system in real life context.

Later adapted to websites

Validity: Correlates well with well established more time consuming general usability scales (e.g. SUMI)

SUS

Scoring:Items 1, 3, 5, 7, 9: • strongly disagree = 0, etc. till• Strongly agree = 4

Items 2, 4, 6, 8. 10:• strongly disagree = 4, etc., till• Strongly agree = 0

Add scores, multiply total by 2.5:Total score range 0 – 100

SUSReliability:

At least 15 users

that have used the website for some realistic tasks in “natural” conditions

Will lead to repeatable results

SUS

Examples of tasks

• Task 1: Your digital camera uses SmartMediacards. Find the least expensive external reader (USB) for your PC that will read them.

• Task 2: You do lots of hiking. Find the least expensive personal GPS with map capability and at least 8 MB of memory.

For these websites:

Finance.yahoo.com

SMEQ (no license needed) http://www.megataq.mcg.gla.ac.uk/smeq.html

Cognitive workload for single tasks, performed with a system

• for experienced users, under standard conditions of use• sample size 10 or more• very simple scoring, very reliable (r = .82)

ISA (no license needed) [email protected]

More easy ways to measure, simple rating buttons, though less reliable?

SMEQ ISA150

exceptional

100 very strong

strong

fair

reasonable

50

somewhat

a little

hardly

0 not at all

SMEQ Dutch version

4. Norms & standards

ISO 9241• colors• non-keyboard input devices• usability principles• information presentation• user guidance• menus• command interfaces• direct manipulation• form filling• natural language interfaces

c. Samenvatting• Testen/evalueren is een wezenlijk en belangrijk

onderdeel van het ontwerp proces:– Levert een verbeterd en dus goed ontwerp– Fouten voorkomen– Acceptatie

• Testen kan zowel vroeg als laat in het proces plaatsvinden

• Er zijn verschillende soorten testen die je kunt gebruiken– Gebruik de juiste tests op het juiste moment

evaluation

Documents

usability laboratorium

based ervaring

based gebruikerknowledge

representations for

formal representations

based observatieaan

centered design

de gebruiker typische