[hci lab] week 5 ux goals and metrics
TRANSCRIPT
Lecture 5
UX Goals and Metrics
2015 Winter Internship Seminar @Yonsei HCI Lab Track II : Prototypes and Evaluations Class hours : Wed. 15:00 – 16:30 4th February, 2015
INTRODUCTION
• What are the goals of your usability study?
– Are you trying to ensure optimal usability for a new piece of functionality?
– Are you benchmarking the user experience for an existing product?
• What are the goals of users?
– Do users complete a task and then stop using the product?
– Do users use the product numerous times on a daily basis
• What is the appropriate evaluation method?
– How many participants are needed to get reliable feedback?
– How will collecting metric impact the timeline and budget?
– How will the data be collected and analyzed?
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 3
STUDY GOALS
• How will the data be used within the product development lifecycle?
• Two general ways to use data
– Formative
– Summative
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 4
STUDY GOALS
FORMATIVE SUMMATIVE
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 6
Chef who periodically checks a dish while it’s being prepared and makes adjustments to positively impact the end result.
Evaluating the dish after it is completed like a restaurant critic who compares the meal with other restaurants.
STUDY GOALS
• Formative Usability
– Evaluates product or design, identifies shortcomings, makes
recommendations
– Repeats process
• Attributes
– Iterative nature of testing with the goal of improving the
design
– Done before the design has been finalized
• Key Questions
– What are the most significant usability issues that are
preventing users from completing their goals or that are
resulting in inefficiencies?
– What aspects of the product work well for users? What do
they find frustrating?
– What are the most common errors or mistakes users are
making?
– Are improvements being made from one design iteration to
the next?
– What usability issues can you expect for remain after the
product is launched?
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 7
Chef who periodically checks a dish while it’s being prepared and makes adjustments to positively impact the end result.
STUDY GOALS
• Summative Usability
– Goal is to evaluate how well a product or piece
of functionality meets its objectives
– Comparing several products to each other
– Focus on evaluating again a certain set of
criteria
• Key Questions
– Did we meet the usability goals of the project?
– How does our product compare against the
competition?
– Have we made improvements from one
product release to the next?
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 8
Evaluating the dish after it is completed like a restaurant critic who compares the meal with other restaurants.
USER GOALS
• Need to know about users and what they are trying to
accomplish
– Forced to use product everyday as part of their jobs?
– Likely to use product only one or twice?
– Is product a source of entertainment?
– Does user care about design aesthetic?
• Simplifies to two main aspects of the user experience
– Performance
– Satisfaction
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 9
USER GOALS
• Performance
– What the user does in interacting with the product
• Metrics (more in Ch 4)
– Degree of success in accomplishing a task or set of
tasks
– Time to perform each task
– Amount of effort to perform task
• Number of mouse clicks
• Cognitive effort
• Important in products that users don’t have choice in
how they are used
– If user can’t successfully complete key tasks, it will fail
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 10
USER GOALS
• Satisfaction
– What users says or thinks about their interaction
• Metrics (more in Ch 6)
– Ease of use
– Exceed expectations
– Visually appealing
– Trustworthy
• Important in products that users have choice in usage
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 11
STUDY DETAILS
• Budgets and Timelines
– Difficult to provide cost or time estimates for a any particular type of study
• General rules of thumb
– Formative study
• Small number of participants (≤10)
• Little impact
– Lab setting with larger number of participants (>12)
• Most significant cost – recruiting and compensating participants
• Time required to run tests
• Additional cost for usability specialists
• Time to clean up and analyze data
– Online study
• Half of the time is spent setting up the study
• Running online study requires little if any time for usability specialist
• Other half of time spent cleaning up and analyzing data
• 100-200 person-hours (50% variation)
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 12
STUDY DETAILS
• Evaluation Methods
– Not restricted to certain type of method (lab test vs. online test)
– Choosing method based on how many participants and what metrics
you want to use
• Lab test with small number of participants
– One-on-one session between moderator and participant
– Participant thinking-aloud, moderator notes participant behavior and
responses to questions
– Metrics to collect
• Issue based metrics – issue frequency, type, severity
• Performance metrics – task success, errors, efficient
• Self-reported metrics – answer questions regarding each task at the end of
study
• Caution
– Easy to over generalize performance and self-reported metrics without
adequate sample size
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 13
STUDY DETAILS
• Evaluation Methods (continued)
• Lab test with larger number of participants
– Able to collect wider range of data because increased sample size means
increased confidence in data
• All performance, self-reported, and physiological metrics are fair game
– Caution
• Inferring website traffic patterns from usability lab data is not very reliable
• Looking at how subtle design changes impact user experience
• Online studies
– Testing with many participants at the same time
– Excellent way to collect a lot of data in a short time
– Able to collect many performance, self reported metrics, subtle design
changes
– Caution
• Difficult to collect issue-based data, can’t directly observe participants
• Good for software or website testing, difficult to test consumer electronics
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 14
STUDY DETAILS
• Participants
– Have major impact in findings
• Recruiting issues
– Identifying the recruiting criteria to determine if participant eligible
for study
• How to segment users
– How many users are needed
• Diversity of user population
• Complexity of product
• Specific goals of study
– Recruiting strategy
• Generate list from customer data
• Send requests via email distribution lists
• Third party
• Posting announcement on website
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 15
STUDY DETAILS
• Data Collection
– Plan how you are capturing data needed for study
– Significant impact on how much work later when analysis begins
• Lab test with small number of participants
– Excel works well
– Have template in place for quickly capturing data during testing
– Data entered in numeric format as much as possible
• 1 – success
• 0 – failure
– Everyone should know coding scheme extremely well
• Someone flips scales or doesn’t understand what to enter
• Throw out data or have to recode data
• Larger studies
– Use data capture tool
– Helpful to have option to download raw data into excel
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 16
STUDY DETAILS
• Data Cleanup
– Rarely in a format that is instantly ready to analyze
– Can take anywhere from one hour to a couple of weeks
• Cleanup tasks
– Filtering data
• Check for extreme values (task completion times)
• Some participants leave in the middle of study, and times are unusually
large
• Impossible short times may indicate user not truly engaged in study
• Results from users who are not in target population
– Creating new variables
• Building on raw data useful
• May create a top-2-box variable for self-reported scales
• Aggregate overall success average representing all tasks
• Create an overall usability score
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 17
STUDY DETAILS
• Cleanup tasks (continued)
– Verifying responses
• Notice large percentage of participants giving the same wrong
answer
• Check why this happens
– Checking consistency
• Make sure data capture properly
• Check task completion times and success to self reported
metrics (completed fast but low rating)
– Data captured incorrectly
– Participant confused the scales of the question
– Transferring data
• Capture and clean up data in Excel, then use another program to
run statistics, then move to Excel to create charts and graphs
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 18
SUMMARY
• Formative vs. summative approach
– Formative – collecting data to help improve design before it is launched or released
– Summative – want to measure the extend to which certain target goal were achieved
• Deciding on the most appropriate metrics, take into account two main aspect of user experiences –
performance and satisfaction
– Performance metrics – characterize what the user does
– Satisfaction metrics - relate to what users think or feel about their experience
• Budgets and timelines need to be planned well out in advance when running any usability study
• Three general types of evaluation methods used to collect usability data
– Lab tests with small number of participants
• Best for formative testing
– Lab test with large number of participants (>12)
• Best for capturing a combination of qualitative and quantitative data
– Online studies with very large number of participants (>100)
• Best to examine subtle design changes and preferences
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 19
SUMMARY
• Clearly identify criteria for recruiting participants
– Truly representative of target group
– Formative
• 6 to 8 users for each iteration is enough
• If distinct groups, helpful to have four from each group
– Summative
• 50 to 100 representative users
• Plan how you are going to capture all the data needed
– Template for quickly capturing data during test
– Everyone familiar with coding conventions
• Data cleanup
– Manipulating data in a way to make them usable and reliable
– Filtering removes extreme values or records that are problematic
– Consistency checks and verifying responses make sure participant intensions map to their responses
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 20
UX GOALS, METRICS, AND TARGETS
Hartson Chapter 10.
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 21
INTRODUCTION
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 22
Figure 10-1 You are here; the chapter on UX goals, metrics, and targets in the context of the overall Wheel lifecycle template.
UX GOALS
• Example: User Experience Goals for Ticket Kiosk System
– We can define the primary high-level UX goals for the ticket buyer to include:
• Fast and easy walk-up-and-use user experience, with absolutely no user training
• Fast learning so new user performance (after limited experience) is on par with that
of an experienced user [from AB-4-8]
• High customer satisfaction leading to high rate of repeat customers [from BC-6-16]
– Some other possibilities:
• High learnability for more advanced tasks [from BB-1-5]
• Draw, engagement, attraction
• Low error rate for completing transactions correctly, especially in the interaction
for payment [from CG-13-17]
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 23
UX TARGET TABLES
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 24
Table 10-1 Our UX target table, as evolved from the Whiteside, Bennett, and Holtzblatt (1988) usability specification table
WORK ROLES, USER CLASSES, AND UX GOALS
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 25
Ticket buyer: Casual new user, for occasional personal use
Walk-up ease of use for new user
Table 10-2 Choosing a work role, user class, and UX goal for a UX target
UX MEASURES
• Objective UX measures (directly measurable by evaluators)
– Initial performance
– Long-term performance (longitudinal, experienced, steady state)
– Learnability
– Retainability
– Advanced feature usage
• Subjective UX measures (based on user opinions)
– First impression (initial opinion, initial satisfaction)
– Long-term (longitudinal) user satisfaction
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 26
MEASURING INSTRUMENTS
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 27
Ticket buyer: Casual new user, for occasional personal use
Walk-up ease of use for new user
Initial user performance
Ticket buyer: Casual new user, for occasional personal use
Initial customer satisfaction
First impression
Table 10-3 Choosing initial performance and first impression as UX measures
MEASURING INSTRUMENTS
• Benchmark Tasks
– Address designer questions with benchmark tasks and UX targets
– Selecting benchmark tasks
• Create benchmark tasks for a representative spectrum of user tasks.
• Start with short and easy tasks and then increase difficulty progressively.
• Include some navigation where appropriate.
• Avoid large amounts of typing (unless typing skill is being evaluated).
• Match the benchmark task to the UX measure.
• Adapt scenarios already developed for design.
• Use tasks in realistic combinations to evaluate task flow.
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 28
MEASURING INSTRUMENTS
• Do not forget to evaluate with your power users.
• To evaluate error recovery, a benchmark task can begin in an error state.
• Consider tasks to evaluate performance in “degraded modes” due to partial
equipment failure.
• Do not try to make a benchmark task for everything.
– Constructing benchmark task content
• Remove any ambiguities with clear, precise, specific, and repeatable instructions.
• Tell the user what task to do, but not how to do it.
• Do not use words in benchmark tasks that appear specifically in the interaction
design.
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 29
MEASURING INSTRUMENTS
• Use work context and usage-centered wording, not system-oriented wording.
• Have clear start and end points for timing.
• Keep some mystery in it for the user.
• Annotate situations where evaluators must ensure pre-conditions for running
benchmark tasks.
• Use “rubrics” for special instructions to evaluators.
• Put each benchmark task on a separate sheet of paper.
• Write a “task script” for each benchmark task.
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 30
MEASURING INSTRUMENTS
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 31
Ticket buyer: Casual new user, for occasional personal use
Walk-up ease of use for new user
Initial user performance
BT1: Buy special event ticket
Ticket buyer: Casual new user, for occasional personal use
Initial customer satisfaction
First impression
Table 10-4 Choosing “buy special event ticket” benchmark task as measuring instrument for “initial performance” UX measure in first UX target
MEASURING INSTRUMENTS
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 32
Ticket buyer: Casual new user, for occasional personal use
Walk-up ease of use for new user
Initial user performance
BT1: Buy special event ticket
Ticket buyer: Casual new user, for occasional personal use
Walk-up ease of use for new user
Initial user performance
BT2: Buy movie ticket
Ticket buyer: Casual new user, for occasional personal use
Initial customer satisfaction
First impression
Table 10-5 Choosing “buy movie ticket” benchmark task as measuring instrument for second initial performance UX measure
MEASURING INSTRUMENTS
– How many benchmark tasks and UX targets do you need?
– Ensure ecological validity [Write your benchmark task descriptions, how
can the setting be made more realistic?]
• What are constraints in user or work context?
• Does the task involve more than one person or role?
• Does the task require a telephone or other physical props?
• Does the task involve background noise?
• Does the task involve interference or interruption?
• Does the user have to deal with multiple simultaneous inputs, for example,
multiple audio feeds through headsets?
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 33
MEASURING INSTRUMENTS
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 34
Ticket buyer: Casual new user, for occasional personal use
Walk-up ease of use for new user
Initial user performance
BT1: Buy special event ticket
Ticket buyer: Casual new user, for occasional personal use
Walk-up ease of use for new user
Initial user performance
BT2: Buy movie ticket
Ticket buyer: Casual new user, for occasional personal use
Initial customer satisfaction
First impression
Questions Q1–Q10 in the QUIS questionnaire
Table 10-6 Choosing questionnaire as measuring instrument for first-impression UX measure
MEASURING INSTRUMENTS
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 35
Ease of first-time use Initial performance Time on task
Ease of learning Learnability Time on task or error rate, after given amount of use and compared with initial performance
High performance for experienced users
Long-term performance Time and error rates
Low error rates Error-related performance
Error rates
Error avoidance in safety critical tasks
Task-specific error performance
Error count,with strict target levels (much more important than time on task)
Error recovery performance
Task-specific time performance
Time on recovery portion of the task
Overall user satisfaction User satisfaction Average score on questionnaire
User attraction to product
User opinion of attractiveness
Average score on questionnaire, with questions focused on the effectiveness of the “draw” factor
Quality of user experience
User opinion of overall experience
Average score on questionnaire, with questions focused on quality of the overall user experience, including specific points about your product that might be associated most closely with emotional impact factors
Overall user satisfaction User satisfaction Average score on questionnaire, with questions focusing on willingness to be a repeat customer and to recommend product to others
Continuing ability of users to perform without relearning
Retainability Time on task and error rates re-evaluated after a period of time off (e.g., a week)
Avoid having user walk away in dissatisfaction
User satisfaction, especially initial Satisfaction
Average score on questionnaire, with questions focusing on initial impressions and satisfaction
Table 10-7 Close connections among UX goals, UX measures, and measuring instruments
UX METRICS
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 36
Ticket buyer: Casual new user, for occasional personal use
Walk-up ease of use for new user
Initial user performance
BT1: Buy special event ticket
Average time on task
Ticket buyer: Casual new user, for occasional personal use
Walk-up ease of use for new user
Initial user performance
BT2: Buy movie ticket
Average number of errors
Ticket buyer: Casual new user, for occasional personal use
Initial customer satisfaction
First impression
Questions Q1–Q10 in the QUIS questionnaire
Average rating across users and across questions
Table 10-8 Choosing UX metrics for UX measures
SETTING LEVELS
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 37
Ticket buyer: Casual new user, for occasional personal use
Walk-up ease of use for new user
Initial user performance
BT1: Buy special event ticket
Average time on task
3 minutes
Ticket buyer: Casual new user, for occasional personal use
Walk-up ease of use for new user
Initial user performance
BT2: Buy movie ticket
Average number of errors
<1
Ticket buyer: Casual new user, for occasional personal use
Initial customer satisfaction
First impression
Questions Q1–Q10 in questionnaire XYZ
Average rating across users and across questions
7.5/10
Table 10-9 Setting baseline levels for UX measures
SETTING LEVELS
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 38
Ticket buyer: Casual new user, for occasional personal use
Walk-up ease of use for new user
Initial user performance
BT1: Buy special event ticket
Average time on task
3 min, as measured at the MUTTS ticket counter
2.5 min
Ticket buyer: Casual new user, for occasional personal use
Walk-up ease of use for new user
Initial user performance
BT2: Buy movie ticket
Average number of errors
<1 <1
Ticket buyer: Casual new user, for occasional personal use
Initial customer satisfaction
First impression
Questions Q1–Q10 in questionnaire XYZ
Average rating across users and across questions
7.5/10 8/10
Ticket buyer: Frequent music patron
Accuracy Experienced usage error rate
BT3: Buy concert ticket
Average number of errors
<1
<1
Casual public ticket Buyer
Walk-up ease of use for new user
Initial user Performance
BT4: Buy Monster Truck Pull tickets
Average time on Task
5 min (online system)
2.5 min
Casual public ticket buyer
Walk-up ease of use for new user
Initial user performance
BT4: Buy Monster Truck Pull tickets
Average number of errors
< 1
<1
Casual public ticket buyer
Initial customer satisfaction
First Impression
QUIS questions 4–7, 10, 13
Average rating across users and across Questions
6/10 8/10
Casual public ticket Buyer
Walk-up ease of use for user with a little experience
Just postinitial performance
BT5: Buy Almost Famous movie tickets
Average time on task
5 min (including review)
2 min
Casual public ticket Buyer
Walk-up ease of use for user with a little experience
Just postinitial performance
BT6: Buy Ben Harper concert tickets
Average number of errors
<1
<1
Table 10-10 Setting target levels for UX metrics
PRACTICAL TIPS AND CAUTIONS FOR CREATING UX TARGETS
• Are user classes for each work role specified clearly enough?
– Have you taken into account potential trade-offs among user groups?
– Are the values for the various levels reasonable?
– Be prepared to adjust your target level values, based on initial observed
results
– Remember that the target level values are averages.
– How well do the UX measures capture the UX goals for the design?
– What if the design is in its early stages and you know the design will change
significantly in the next version, anyway?
– What about UX goals, metrics, and targets for usefulness and emotional
impact?
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 39
Choosing the Right Metrics Ten Types of Usability Studies
• Issue Based Metrics (Ch 5)
– Anything that prevents task completion
– Anything that takes someone off course
– Anything that creates some level of confusion
– Anything that produces an error
– Not seeing something that should be noticed
– Assuming something should be correct when it is not
– Assuming a task is complete when it is not
– Performing the wrong action
– Misinterpreting some piece of content
– Not understanding the navigation
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 41
Task Success
Task Time
Errors
Efficiency
Learnability
Issue Based Metrics
Self Reported Metrics
Behavioral and Physiological Metrics
Combined and Comparative Metrics
Live Website Metrics
Card Sorting Data
Choosing the Right Metrics Ten Types of Usability Studies
• Self Reported Metrics (Ch 6) : Asking participant for information about their
perception of the system and their interaction with it
– Overall interaction
– Ease of use
– Effectiveness of navigation
– Awareness of certain features
– Clarity of terminology
– Visual appeal
– Likert scales
– Semantic differential scales
– After-scenario questionnaire
– Expectation measures
– Usability Magnitude Estimation
– SUS
– CUSQ (Computer System Usability Scale)
– QUIS (Questionnaire for User Interface Satisfaction)
– WAMMI (Website Analysis & Measurement Inventory)
– Product Reaction Cards
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 42
Task Success
Task Time
Errors
Efficiency
Learnability
Issue Based Metrics
Self Reported Metrics
Behavioral and Physiological Metrics
Combined and Comparative Metrics
Live Website Metrics
Card Sorting Data
Choosing the Right Metrics Ten Types of Usability Studies
• Behavioral and Physiological Metrics (Ch 7)
– Verbal Behaviors
• Strongly positive comment
• Strongly negative comment
• Suggestion for improvement
• Question
• Variation from expectation
• Stated confusion/frustration
– Nonverbal Behaviors
• Frowning/Grimacing/Unhappy
• Smiling/Laughing/Happy
• Surprised/Unexpected
• Furrowed brow/Concentration
• Evidence of impatience
• Leaning in close to screen
• Fidgeting in chair
• Rubbing head/eyes/neck
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 43
Task Success
Task Time
Errors
Efficiency
Learnability
Issue Based Metrics
Self Reported Metrics
Behavioral and Physiological Metrics
Combined and Comparative Metrics
Live Website Metrics
Card Sorting Data
Choosing the Right Metrics Ten Types of Usability Studies
• Combined and Comparative Metrics (Ch
8)
– Taking smaller pieces of raw data like
task completion rates, time-on-task, self
reported ease of use to derive new
metrics such as an overall usability
metric or usability score card
– Comparing existing usability data to
expert or idea results
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 44
Task Success
Task Time
Errors
Efficiency
Learnability
Issue Based Metrics
Self Reported Metrics
Behavioral and Physiological Metrics
Combined and Comparative Metrics
Live Website Metrics
Card Sorting Data
Choosing the Right Metrics Ten Types of Usability Studies
• Live Website Metrics (Ch 9)
– Information you can glean from live data
on a production website
• Server logs – page views and visits
• Click through rates - # times link shown vs.
actually clicked
• Drop off rates – abandoned process
• A/B studies – manipulate the pages users
see and compare metrics between them
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 45
Task Success
Task Time
Errors
Efficiency
Learnability
Issue Based Metrics
Self Reported Metrics
Behavioral and Physiological Metrics
Combined and Comparative Metrics
Live Website Metrics
Card Sorting Data
Choosing the Right Metrics Ten Types of Usability Studies
• Card Sorting Data (Ch 9)
– Open card sort
• Give participants cards, they sort and
define groups
– Closed card sort
• Give participants cards and name of
groups, they put cards into groups
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 46
Task Success
Task Time
Errors
Efficiency
Learnability
Issue Based Metrics
Self Reported Metrics
Behavioral and Physiological Metrics
Combined and Comparative Metrics
Live Website Metrics
Card Sorting Data
Choosing the Right Metrics Ten Types of Usability Studies
• Increasing Awareness
– Aimed at increasing awareness of a specific piece of content
or functionality
– Why is something not noticed or used?
• Metrics
– Live Website Metrics
• Monitor interactions
• Not foolproof – user may notice and decide not to click,
alternatively user may click but not notice interaction
• A/B testing to see how small changes impact user behavior
– Self Reported Metrics
• Pointing out specific elements to user and asking whether
they had noticed those elements during task
• Aware of feature before study began
– Not everyone has good memory
• Show users different elements and ask them to choose
which one they saw during task
– Behavioral and Physiological Metrics
• Eye tracking
– Determine amount of time looking at a certain element
– Average time spent looking at a certain element
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 47
Task Success
Task Time
Errors
Efficiency
Learnability
Issue Based Metrics
Self Reported Metrics
Behavioral and Physiological Metrics
Combined and Comparative Metrics
Live Website Metrics
Card Sorting Data
Choosing the Right Metrics Ten Types of Usability Studies
• Problem Discovery
– Identify major usability issues
– After deployment, find out what annoys users
– Periodic checkup to see how users are interaction with
the product
• Discovery vs. usability study
– Open-ended
– Participants may generate own tasks
– Strive for realism in typical task and in user’s
environment
– Comparing across participants can be difficult
• Metrics
– Issue Based Metrics
• Capture all usability issues, you can convert into type
and frequency
• Assign severity rating and develop a quick-hit list of
design improvements
– Self Reported Metrics
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 48
Task Success
Task Time
Errors
Efficiency
Learnability
Issue Based Metrics
Self Reported Metrics
Behavioral and Physiological Metrics
Combined and Comparative Metrics
Live Website Metrics
Card Sorting Data
Choosing the Right Metrics Ten Types of Usability Studies
• Creating an Overall Positive User Experience
– Not enough to be usable, want exceptional user
experience
– Thought provoking, entertaining, slightly-addictive
– Performance useful, but what user thinks, feels, and
says really matters
• Metrics
– Self Reported
• Satisfaction – common but not enough
• Exceed expectations – want user to say it was easier,
more efficient, or more entertaining than expected
• Likelihood to purchase, use in future
• Recommend to a friend
• Behavioral and Physiological
– Pupil diameter
– Heart rate
– Skin conductance
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 49
Task Success
Task Time
Errors
Efficiency
Learnability
Issue Based Metrics
Self Reported Metrics
Behavioral and Physiological Metrics
Combined and Comparative Metrics
Live Website Metrics
Card Sorting Data
Choosing the Right Metrics Ten Types of Usability Studies
• Comparing Designs
– Comparing more than one design alternative
– Early in the design process teams put together semi-
functional prototypes
– Evaluate using predefined set of metrics
• Participants
– Can’t ask same participant to perform same tasks with
all designs
– Even with counterbalancing design and task order,
information on valuable
• Procedure
– Study as between-subjects, participant only works with
one design
– Have primary design participant works with, show
alternative designs and ask for preference
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 50
Task Success
Task Time
Errors
Efficiency
Learnability
Issue Based Metrics
Self Reported Metrics
Behavioral and Physiological Metrics
Combined and Comparative Metrics
Live Website Metrics
Card Sorting Data
Choosing the Right Metrics Ten Types of Usability Studies
• Comparing Designs (continued)
• Metrics
– Task Success
• Indicates which design more usable
• Small sample size, limited value
– Task Time
• Indicates which design more usable
• Small sample size, limited value
– Issue Based Metrics
• Compare the frequency of high-, medium-, and
lowseverity issues across designs to see which one
most usable
– Self Reported Metrics
• Ask participant to choose the prototype they would
most like to use in the future (forced comparison)
• As participant to rate each prototype along
dimensions such as ease of use and visual appeal
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 51
Task Success
Task Time
Errors
Efficiency
Learnability
Issue Based Metrics
Self Reported Metrics
Behavioral and Physiological Metrics
Combined and Comparative Metrics
Live Website Metrics
Card Sorting Data
Independent & Dependent Variables
Independent variables:
– The things you manipulate or contro
l for, e.g.,
– Aspect of a study that you manipula
te
– Chosen based on research question
– e.g.
• Characteristics of participants (e.g.,
age, sex, relevant experience)
• Different designs or prototypes bei
ng tested
• Tasks
Dependent variables: – The things you measure
– Describes what happened as a result
of the study
– Something you measure as the result,
or as dependent on, how you manipul
ate the independent variables
– e.g.
• Task Success
• Task Time
• SUS score
• etc.
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 52
Need to have a clear idea of what you plan to manipulate and what you plan to measure
Designing a Usability Study
RQ 1
• Research Question :
– Differences in performance be
tween males and females
• Independent variable
– : Gender
• Dependent variable
– : Task completion time
RQ 2
• Research Question :
– Differences in satisfaction bet
ween novice and expert users
• Independent variable :
– Experience level
• Dependent variable :
– Satisfaction
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 53
Types of Data
• Nominal (aka Categorical)
– e.g., Male, Female; Design A, Design B.
• Ordinal
– e.g., Rank ordering of 4 designs tested from Most Visually Appealing to
Least Visually Appealing.
• Interval
– e.g., 7-point scale of agreement: “This design is visually appealing.
Strongly Disagree . . . Strongly Agree”
• Ratio
– e.g., Time, Task Success %
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 54
NORMINAL DATA
• Definition
– Unordered groups or categories
– Without order, cannot say one is better than another
• May provide characteristics of users, independent variables that allow you to segment
data
– Windows versus Mac users
– Geographical location
– Males versus females
• What about dependent variables?
– Number of users who clicked on A vs. B
– Task success
• Usage
– Counts and frequencies
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 55
ORDINAL DATA
• Definition
– Ordered groups and categories
– Data is ordered in a certain way but intervals between measurements are not
meaningful
• Ordinal data comes from self-reported data on questionnaires
– Website rated as excellent, good, fair, or poor
– Severity rating of problem encountered as high, medium, or low
• Usage
– Looking at frequencies
– Calculating average is meaningless (distance between high and medium may
not be the same as medium and low)
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 56
INTERVAL DATA
• Definition
– Continous data where differences between the measurements are meaningful
– Zero point on the scale is arbitrary
• System Usability Scale (SUS)
– Example of interval data
– Based on self-reported data from a series of questions about overall usability
– Scores range from 0 to 100
• Higher score indicates better usability
• Distance between points meaningful because it indicates increase/decrease in percieved
usability
• Usage
– Able to calculate descriptive statistics such as average, standard deviation, etc.
– Inferal statistics can be used to generalize a population
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 57
Ordinal vs. Interval Rating Scales
• Are these two scales different?
• Top scale is ordinal. You should only calculate frequencies of each
response.
• Bottom scale can be considered interval. You can also calculate
means.
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 58
RATIO DATA
• Definition
– Same as interval data with the addition of absolute zero
– Zero has inherit meaning
• Example
– Difference between a person of 35 and a person 38 is the same as the
difference between people who are 12 and 15
– Time to completion, you can say that one participant is twice as fast as
another
• Usage
– Most analysis that you do work with ratio and interval data
– Geometric mean is an exception, need ratio data
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 59
Confidence Intervals
• Assume this was your time data for a study with 5 participants:
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 61
Does that make a difference in your answer?
Calculating Confidence Intervals
– <alpha> is normally .05 (for
a 95% confidence interval)
– <std dev> is the standard d
eviation of the set of numbe
rs (9.6 in this example)
– <n> is how many numbers
are in the set (5 in this exa
mple)
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 62
=CONFIDENCE(<alpha>,<std dev>,<n>)
Excel Example
Show Error Bars
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 63
Excel Example
Binary Success
• Pass/fail (or other binary criteria)
• 1’s (success) and 0’s (failure)
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 65
Confidence Interval for Task Success
• When you look at task success data across participants for a single
task the data is commonly binary:
– Each participant either passed or failed on the task.
• In this situation, you need to calculate the confidence interval using
the binomial distribution.
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 66
Example
– Easiest way to calculate confidence interval is using Jeff Sauro’s
web calculator:
– http://www.measuringusability.com/wald.htm
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 67
1=success, 0=failure. So, 6/8 succeeded, or 75%.
Chi-square
• Allows you to compare actual and expected frequencies for
categorical data.
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 68
=CHITEST(<actual range>,<expected range>)
Excel Example
Comparing Means
T-test
• Independent samples (betw
een subjects)
– Apollo websites, task times
T-test
• Paired samples (within subje
cts)
– Haptic mouse study
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 69
T-tests in Excel
Independent Samples: Paired Samples:
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 70
=TTEST(<array1>,<array2>,x,y)
x = 2 (for two-tailed test) in almost all cases
y = 2 (independent samples) y = 1 (paired samples)
Comparing Multiple Means
• Analysis of Variance (ANOVA)
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 71
“Tools” > “Data Analysis” > “Anova: Single Factor” Excel example: Study comparing 4 navigation approaches for a website
Exercise 10-2: Creating Benchmark Tasks and UX Targets for Your System
• Goal
– To gain experience in writing effective benchmark tasks and measurable UX targets.
• Activities
– We have shown you a rather complete set of examples of benchmark tasks and UX targets for the Ticket Kiosk
System. Your job is to do something similar for the system of your choice.
– Begin by identifying which work roles and user classes you are targeting in evaluation (brief description is
enough).
– Write three or more UX table entries (rows), including your choices for each column. Have at least two UX
targets based on a benchmark task and at least one based on a questionnaire.
– Create and write up a set of about three benchmark tasks to go with the UX targets in the table.
• Do NOT make the tasks too easy.
• Make tasks increasingly complex.
• Include some navigation.
• Create tasks that you can later “implement” in your low-fidelity rapid prototype.
• The expected average performance time for each task should be no more than about 3 minutes, just to keep it
short and simple for you during evaluation.
– Include the questionnaire question numbers in the measuring instrument column of the appropriate UX target.
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 72
Exercise 10-2: Creating Benchmark Tasks and UX Targets for Your System
• Cautions and hints:
– Do not spend any time on design in this exercise; there will be time for detailed design in
the next exercise.
– Do not plan to give users any training.
• Deliverables:
– Two user benchmark tasks, each on a separate sheet of paper.
– Three or more UX targets entered into a blank UX target table on your laptop or on paper.
– If you are doing this exercise in a classroom environment, finish up by reading your
benchmark tasks to the class for critique and discussion.
• Schedule
– Work efficiently and complete in about an hour and a half.
Lecture #5 2015 Winter Internship @Yonsei HCI Lab 73