involving users in interface evaluation marti hearst (ucb sims) sims 213, ui design &...
TRANSCRIPT
Involving Users in Interface Evaluation
Marti Hearst (UCB SIMS)SIMS 213, UI Design &
DevelopmentApril 8, 1999
Adapted from slide by James Landay
Outline
Why do user testing? Informal studies
– collecting and analyzing process data– ethical considerations
Formal studies– chosing variables– interaction effects– special considerations for studies
involving users
What is Usability? The extent to which a product can be
used by specified users to achieve specified goals with– effectiveness– efficiency– satisfaction
in a specified context of use. [ISO9241]
Usability evaluation is a methodology for measuring these usability aspects of a user interface
Adapted from slide by James Landay
Why do User Testing?
Can’t tell how good or bad UI is until:– people use it!
Other methods are based on evaluators:– may know too much– may not know enough (about tasks, etc.)
Summary: Hard to predict what real users will do
Adapted from slide by James Landay
Two Main Approaches Less formal, get a feeling for how
users will use interface– participants work through task scenarios– gather process data– obtain user preference information
Formal studies– isolate the effects of particular UI
components– compare competing designs– make quantitative measurements
Adapted from slide by James Landay
Why Two Main Approaches?
Informal Study– Prcess data is easier to gather– Gives an overview of where big problems
are Formal Study
– need many participants to prove your points (obtain statistical significance)
– experiments that isolate effects properly often end up measuring things that are too fine-grained to really inform UI design
Adapted from slide by James Landay
Informal Study
Select tasks Select participant groups Decide methodology for collecting
data and what kinds of data to collect
Do the study Analyze the results Make recommendations for changes
to the design
Adapted from slide by James Landay
Selecting Tasks Should reflect what real tasks will be like Tasks from analysis & design can be used
– may need to shorten if» they take too long» require background that test user won’t have
Avoid bending tasks in direction of what your design best supports
May have to simplity in order to produce usable results
Adapted from slide by James Landay
Choosing Participants Should be representative of eventual users in
terms of– job-specific vocabulary / knowledge– tasks
If you can’t get real users, get approximation– system intended for doctors
» get medical students
– system intended for electrical engineers» get engineering students
Use incentives to get participants
Adapted from slide by James Landay
Deciding on Data to Collect
Process data– observations of what users are doing
& thinking– kinds of errors made– general strategies used and not used
Adapted from slide by James Landay
The “Thinking Aloud” Method Need to know what users are thinking,
not just what they are doing Ask users to talk while performing tasks
– tell us what they are thinking– tell us what they are trying to do– tell us questions that arise as they work– tell us things they read
Make a recording or take good notes– make sure you can tell what they were
doing
Adapted from slide by James Landay
Thinking Aloud (cont.) Prompt the user to keep talking
– “tell me what you are thinking” Only help on things you have pre-
decided– keep track of anything you do give help
on Recording
– use a digital watch/clock– take notes, plus if possible
»record audio and video (or event logs)
Adapted from slide by James Landay
Ethical Considerations Sometimes tests can be distressing You have a responsibility to alleviate this
– make voluntary with informed consent form– avoid pressure to participate– let them know they can stop at any time [Gomoll]
– stress that you are testing the system, not them
– make collected data as anonymous as possible Often must get official approval for use of
human subjects– There is a campus exception for class projects
Adapted from slide by James Landay
User Test Proposal A report that contains
– objective– description of system being testing– hypotheses– task environment & materials– participants– methodology– tasks– test measures
A good strategy:– Get this approved & then reuse it when writing up your
results
Adapted from slide by James Landay
Using the Test Results Summarize the data
– make a list of all critical incidents (CI)»positive & negative
– include references back to original data– try to judge why each difficulty occurred
What does data tell you?– Did the UI work the way you thought it
would?– Is something missing?
Adapted from slide by James Landay
Using the Results (cont.) Update task analysis and rethink
design – rate severity & ease of fixing CIs– fix both severe problems & make the easy fixes
Will thinking aloud give the right answers?– not always– if you ask a question, people will always give an
answer, even it is has nothing to do with the facts
» try to avoid specific questions
Adapted from slide by James Landay
Measuring User Preference How much users like or dislike the system
– often use Likert scales– or have them choose among statements
» “best UI I’ve ever…”, “better than average”…
– hard to be sure what data will mean» novelty of UI, feelings, not realistic setting, etc.
– Shneiderman’s QUIS is a general example (in the reader)
If many give you low ratings -> trouble Can get some useful data by asking
– what they liked, disliked, where they had trouble, best part, worst part, etc. (redundant questions)
Adapted from slide by James Landay
Formal Usability Studies
Situations in which these are useful– to determine time requirements for task
completion– to compare two designs on measurable aspects
» time required» number of errors» effectiveness for achieving very specific tasks
Do not combine with thinking-aloud– talking can affect speed & accuracy (neg. & pos.)
Require Experiment Design
Experiment Design Experiment design involves
determining how many experiments to run and which attributes to vary in each experiment
Goal: isolate which aspects of the interface really make a difference
Experiment Design Decide on
– Response variables» the outcome of the experiment»usually the system performance»aka dependent variable(s)
– Factors (aka attributes))» aka independent variables
– Levels (aka values for attributes)– Replication
»how often to repeat each combination of choices
Experiment Design Studying a system; ignoring users Say we want to determine how to
configure the hardware for a personal workstation (from Jain 91, The art of computer systems performance analysis)
– Hardware choices»which CPU (three types)»how much memory (four amounts)»how many disk drives (from 1 to 3)
– Workload characteristics»administration, management, scientific
Experiment Design We want to isolate the effect of each
component for the given workload type. How do we do this?
– WL1 CPU1 Mem1 Disk1– WL1 CPU1 Mem1 Disk2– WL1 CPU1 Mem1 Disk3– WL1 CPU1 Mem2 Disk1– WL1 CPU1 Mem2 Disk2– …
There are (3 CPUs)*(4 memory sizes)*(3 disk sizes)*(3 workload types) = 108 combinations!
Experiment Design
One strategy to reduce the number of comparisons needed:– pick just one attribute– vary it– hold the rest constant
Problems:– inefficient– might miss effects of interactions
Interactions among Attributes
A1 A2B1 3 5B2 6 8
A1 A2B1 3 5B2 6 9
A1
B1B1
A2
A1
B2
A2
B2
Non-interacting Interacting
Experiment Design Another strategy: figure out which
attributes are important first Do this by just comparing a few
major attributes at a time – if an attribute has a strong effect,
include it in future studies– otherwise assume it is safe to drop it
This strategy also allows you to find interactions between a few attributes
Special Considerations for Formal Studies with Human
Participants Studies involving human participants
vs. measuring automated systems– people get tired– people get bored– people (may) get upset by some tasks– learning effects
»people will learn how to do the tasks (or the answers to questions) if repeated
»people will (usually) learn how to use the system over time
More Special Considerations
High variability among people– especially when involved in
reading/comprehension tasks– especially when following hyperlinks!
(can go all over the place)
Adapted from slide by James Landay
Between Groups vs. Within Groups
Do participants see only one design or both? Between groups
– two groups of test users– each group uses only 1 of the systems
Within groups experiment– one group of test users
» each person uses both systems» can’t use the same tasks (learning)
– best for low-level interaction techniques Why is this a consideration?n
– People often learn during the experiment.
Summary User testing is important, but takes
time/effort Use real tasks & representative participants Be ethical & treat your participants well Want to know what people are doing & why
– collect process data– early testing can be done on mock-ups (low-fi)
More on formal studies next time.