involving users in interface evaluation marti hearst (ucb sims) sims 213, ui design &...

Involving Users in Interface Evaluation

Marti Hearst (UCB SIMS)SIMS 213, UI Design &

DevelopmentApril 8, 1999

Adapted from slide by James Landay

Outline

Why do user testing? Informal studies

– collecting and analyzing process data– ethical considerations

Formal studies– chosing variables– interaction effects– special considerations for studies

involving users

What is Usability? The extent to which a product can be

used by specified users to achieve specified goals with– effectiveness– efficiency– satisfaction

in a specified context of use. [ISO9241]

Usability evaluation is a methodology for measuring these usability aspects of a user interface


Why do User Testing?

Can’t tell how good or bad UI is until:– people use it!

Other methods are based on evaluators:– may know too much– may not know enough (about tasks, etc.)

Summary: Hard to predict what real users will do


Two Main Approaches Less formal, get a feeling for how

users will use interface– participants work through task scenarios– gather process data– obtain user preference information

Formal studies– isolate the effects of particular UI

components– compare competing designs– make quantitative measurements


Why Two Main Approaches?

Informal Study– Prcess data is easier to gather– Gives an overview of where big problems

are Formal Study

– need many participants to prove your points (obtain statistical significance)

– experiments that isolate effects properly often end up measuring things that are too fine-grained to really inform UI design


Informal Study

Select tasks Select participant groups Decide methodology for collecting

data and what kinds of data to collect

Do the study Analyze the results Make recommendations for changes

to the design


Selecting Tasks Should reflect what real tasks will be like Tasks from analysis & design can be used

– may need to shorten if» they take too long» require background that test user won’t have

Avoid bending tasks in direction of what your design best supports

May have to simplity in order to produce usable results


Choosing Participants Should be representative of eventual users in

terms of– job-specific vocabulary / knowledge– tasks

If you can’t get real users, get approximation– system intended for doctors

» get medical students

– system intended for electrical engineers» get engineering students

Use incentives to get participants


Deciding on Data to Collect

Process data– observations of what users are doing

& thinking– kinds of errors made– general strategies used and not used


The “Thinking Aloud” Method Need to know what users are thinking,

not just what they are doing Ask users to talk while performing tasks

– tell us what they are thinking– tell us what they are trying to do– tell us questions that arise as they work– tell us things they read

Make a recording or take good notes– make sure you can tell what they were

doing


Thinking Aloud (cont.) Prompt the user to keep talking

– “tell me what you are thinking” Only help on things you have pre-

decided– keep track of anything you do give help

on Recording

– use a digital watch/clock– take notes, plus if possible

»record audio and video (or event logs)


Ethical Considerations Sometimes tests can be distressing You have a responsibility to alleviate this

– make voluntary with informed consent form– avoid pressure to participate– let them know they can stop at any time [Gomoll]

– stress that you are testing the system, not them

– make collected data as anonymous as possible Often must get official approval for use of

human subjects– There is a campus exception for class projects


User Test Proposal A report that contains

– objective– description of system being testing– hypotheses– task environment & materials– participants– methodology– tasks– test measures

A good strategy:– Get this approved & then reuse it when writing up your

results


Using the Test Results Summarize the data

– make a list of all critical incidents (CI)»positive & negative

– include references back to original data– try to judge why each difficulty occurred

What does data tell you?– Did the UI work the way you thought it

would?– Is something missing?


Using the Results (cont.) Update task analysis and rethink

design – rate severity & ease of fixing CIs– fix both severe problems & make the easy fixes

Will thinking aloud give the right answers?– not always– if you ask a question, people will always give an

answer, even it is has nothing to do with the facts

» try to avoid specific questions


Measuring User Preference How much users like or dislike the system

– often use Likert scales– or have them choose among statements

» “best UI I’ve ever…”, “better than average”…

– hard to be sure what data will mean» novelty of UI, feelings, not realistic setting, etc.

– Shneiderman’s QUIS is a general example (in the reader)

If many give you low ratings -> trouble Can get some useful data by asking

– what they liked, disliked, where they had trouble, best part, worst part, etc. (redundant questions)


Formal Usability Studies

Situations in which these are useful– to determine time requirements for task

completion– to compare two designs on measurable aspects

» time required» number of errors» effectiveness for achieving very specific tasks

Do not combine with thinking-aloud– talking can affect speed & accuracy (neg. & pos.)

Require Experiment Design

Experiment Design Experiment design involves

determining how many experiments to run and which attributes to vary in each experiment

Goal: isolate which aspects of the interface really make a difference

Experiment Design Decide on

– Response variables» the outcome of the experiment»usually the system performance»aka dependent variable(s)

– Factors (aka attributes))» aka independent variables

– Levels (aka values for attributes)– Replication

»how often to repeat each combination of choices

Experiment Design Studying a system; ignoring users Say we want to determine how to

configure the hardware for a personal workstation (from Jain 91, The art of computer systems performance analysis)

– Hardware choices»which CPU (three types)»how much memory (four amounts)»how many disk drives (from 1 to 3)

– Workload characteristics»administration, management, scientific

Experiment Design We want to isolate the effect of each

component for the given workload type. How do we do this?

– WL1 CPU1 Mem1 Disk1– WL1 CPU1 Mem1 Disk2– WL1 CPU1 Mem1 Disk3– WL1 CPU1 Mem2 Disk1– WL1 CPU1 Mem2 Disk2– …

There are (3 CPUs)*(4 memory sizes)*(3 disk sizes)*(3 workload types) = 108 combinations!

Experiment Design

One strategy to reduce the number of comparisons needed:– pick just one attribute– vary it– hold the rest constant

Problems:– inefficient– might miss effects of interactions

Interactions among Attributes

A1 A2B1 3 5B2 6 8

A1 A2B1 3 5B2 6 9

A1

B1B1

A2

A1

B2

A2

B2

Non-interacting Interacting

Experiment Design Another strategy: figure out which

attributes are important first Do this by just comparing a few

major attributes at a time – if an attribute has a strong effect,

include it in future studies– otherwise assume it is safe to drop it

This strategy also allows you to find interactions between a few attributes

Special Considerations for Formal Studies with Human

Participants Studies involving human participants

vs. measuring automated systems– people get tired– people get bored– people (may) get upset by some tasks– learning effects

»people will learn how to do the tasks (or the answers to questions) if repeated

»people will (usually) learn how to use the system over time

More Special Considerations

High variability among people– especially when involved in

reading/comprehension tasks– especially when following hyperlinks!

(can go all over the place)


Between Groups vs. Within Groups

Do participants see only one design or both? Between groups

– two groups of test users– each group uses only 1 of the systems

Within groups experiment– one group of test users

» each person uses both systems» can’t use the same tasks (learning)

– best for low-level interaction techniques Why is this a consideration?n

– People often learn during the experiment.

Summary User testing is important, but takes

time/effort Use real tasks & representative participants Be ethical & treat your participants well Want to know what people are doing & why

– collect process data– early testing can be done on mock-ups (low-fi)

More on formal studies next time.

involving users in interface evaluation marti hearst (ucb sims) sims 213, ui design &...

Documents

specified users

real users

james landaywhy

james landaydeciding

james landaywhat

james landaythinking

james landaythe

doingask users