[hci lab] week 5 ux goals and metrics

Lecture 5

UX Goals and Metrics

2015 Winter Internship Seminar @Yonsei HCI Lab Track II : Prototypes and Evaluations Class hours : Wed. 15:00 – 16:30 4th February, 2015

PLANNING Tullis Chapter 3.

Lecture #5 2015 Winter Internship @Yonsei HCI Lab 2

INTRODUCTION

• What are the goals of your usability study?

– Are you trying to ensure optimal usability for a new piece of functionality?

– Are you benchmarking the user experience for an existing product?

• What are the goals of users?

– Do users complete a task and then stop using the product?

– Do users use the product numerous times on a daily basis

• What is the appropriate evaluation method?

– How many participants are needed to get reliable feedback?

– How will collecting metric impact the timeline and budget?

– How will the data be collected and analyzed?


STUDY GOALS

• How will the data be used within the product development lifecycle?

• Two general ways to use data

– Formative

– Summative


STUDY GOALS

FORMATIVE SUMMATIVE


STUDY GOALS

FORMATIVE SUMMATIVE


Chef who periodically checks a dish while it’s being prepared and makes adjustments to positively impact the end result.

Evaluating the dish after it is completed like a restaurant critic who compares the meal with other restaurants.

STUDY GOALS

• Formative Usability

– Evaluates product or design, identifies shortcomings, makes

recommendations

– Repeats process

• Attributes

– Iterative nature of testing with the goal of improving the

design

– Done before the design has been finalized

• Key Questions

– What are the most significant usability issues that are

preventing users from completing their goals or that are

resulting in inefficiencies?

– What aspects of the product work well for users? What do

they find frustrating?

– What are the most common errors or mistakes users are

making?

– Are improvements being made from one design iteration to

the next?

– What usability issues can you expect for remain after the

product is launched?


Chef who periodically checks a dish while it’s being prepared and makes adjustments to positively impact the end result.

STUDY GOALS

• Summative Usability

– Goal is to evaluate how well a product or piece

of functionality meets its objectives

– Comparing several products to each other

– Focus on evaluating again a certain set of

criteria

• Key Questions

– Did we meet the usability goals of the project?

– How does our product compare against the

competition?

– Have we made improvements from one

product release to the next?


Evaluating the dish after it is completed like a restaurant critic who compares the meal with other restaurants.

USER GOALS

• Need to know about users and what they are trying to

accomplish

– Forced to use product everyday as part of their jobs?

– Likely to use product only one or twice?

– Is product a source of entertainment?

– Does user care about design aesthetic?

• Simplifies to two main aspects of the user experience

– Performance

– Satisfaction


USER GOALS

• Performance

– What the user does in interacting with the product

• Metrics (more in Ch 4)

– Degree of success in accomplishing a task or set of

tasks

– Time to perform each task

– Amount of effort to perform task

• Number of mouse clicks

• Cognitive effort

• Important in products that users don’t have choice in

how they are used

– If user can’t successfully complete key tasks, it will fail


USER GOALS

• Satisfaction

– What users says or thinks about their interaction

• Metrics (more in Ch 6)

– Ease of use

– Exceed expectations

– Visually appealing

– Trustworthy

• Important in products that users have choice in usage


STUDY DETAILS

• Budgets and Timelines

– Difficult to provide cost or time estimates for a any particular type of study

• General rules of thumb

– Formative study

• Small number of participants (≤10)

• Little impact

– Lab setting with larger number of participants (>12)

• Most significant cost – recruiting and compensating participants

• Time required to run tests

• Additional cost for usability specialists

• Time to clean up and analyze data

– Online study

• Half of the time is spent setting up the study

• Running online study requires little if any time for usability specialist

• Other half of time spent cleaning up and analyzing data

• 100-200 person-hours (50% variation)


STUDY DETAILS

• Evaluation Methods

– Not restricted to certain type of method (lab test vs. online test)

– Choosing method based on how many participants and what metrics

you want to use

• Lab test with small number of participants

– One-on-one session between moderator and participant

– Participant thinking-aloud, moderator notes participant behavior and

responses to questions

– Metrics to collect

• Issue based metrics – issue frequency, type, severity

• Performance metrics – task success, errors, efficient

• Self-reported metrics – answer questions regarding each task at the end of

study

• Caution

– Easy to over generalize performance and self-reported metrics without

adequate sample size


STUDY DETAILS

• Evaluation Methods (continued)

• Lab test with larger number of participants

– Able to collect wider range of data because increased sample size means

increased confidence in data

• All performance, self-reported, and physiological metrics are fair game

– Caution

• Inferring website traffic patterns from usability lab data is not very reliable

• Looking at how subtle design changes impact user experience

• Online studies

– Testing with many participants at the same time

– Excellent way to collect a lot of data in a short time

– Able to collect many performance, self reported metrics, subtle design

changes

– Caution

• Difficult to collect issue-based data, can’t directly observe participants

• Good for software or website testing, difficult to test consumer electronics


STUDY DETAILS

• Participants

– Have major impact in findings

• Recruiting issues

– Identifying the recruiting criteria to determine if participant eligible

for study

• How to segment users

– How many users are needed

• Diversity of user population

• Complexity of product

• Specific goals of study

– Recruiting strategy

• Generate list from customer data

• Send requests via email distribution lists

• Third party

• Posting announcement on website


STUDY DETAILS

• Data Collection

– Plan how you are capturing data needed for study

– Significant impact on how much work later when analysis begins

• Lab test with small number of participants

– Excel works well

– Have template in place for quickly capturing data during testing

– Data entered in numeric format as much as possible

• 1 – success

• 0 – failure

– Everyone should know coding scheme extremely well

• Someone flips scales or doesn’t understand what to enter

• Throw out data or have to recode data

• Larger studies

– Use data capture tool

– Helpful to have option to download raw data into excel


STUDY DETAILS

• Data Cleanup

– Rarely in a format that is instantly ready to analyze

– Can take anywhere from one hour to a couple of weeks

• Cleanup tasks

– Filtering data

• Check for extreme values (task completion times)

• Some participants leave in the middle of study, and times are unusually

large

• Impossible short times may indicate user not truly engaged in study

• Results from users who are not in target population

– Creating new variables

• Building on raw data useful

• May create a top-2-box variable for self-reported scales

• Aggregate overall success average representing all tasks

• Create an overall usability score


STUDY DETAILS

• Cleanup tasks (continued)

– Verifying responses

• Notice large percentage of participants giving the same wrong

answer

• Check why this happens

– Checking consistency

• Make sure data capture properly

• Check task completion times and success to self reported

metrics (completed fast but low rating)

– Data captured incorrectly

– Participant confused the scales of the question

– Transferring data

• Capture and clean up data in Excel, then use another program to

run statistics, then move to Excel to create charts and graphs


SUMMARY

• Formative vs. summative approach

– Formative – collecting data to help improve design before it is launched or released

– Summative – want to measure the extend to which certain target goal were achieved

• Deciding on the most appropriate metrics, take into account two main aspect of user experiences –

performance and satisfaction

– Performance metrics – characterize what the user does

– Satisfaction metrics - relate to what users think or feel about their experience

• Budgets and timelines need to be planned well out in advance when running any usability study

• Three general types of evaluation methods used to collect usability data

– Lab tests with small number of participants

• Best for formative testing

– Lab test with large number of participants (>12)

• Best for capturing a combination of qualitative and quantitative data

– Online studies with very large number of participants (>100)

• Best to examine subtle design changes and preferences


SUMMARY

• Clearly identify criteria for recruiting participants

– Truly representative of target group

– Formative

• 6 to 8 users for each iteration is enough

• If distinct groups, helpful to have four from each group

– Summative

• 50 to 100 representative users

• Plan how you are going to capture all the data needed

– Template for quickly capturing data during test

– Everyone familiar with coding conventions

• Data cleanup

– Manipulating data in a way to make them usable and reliable

– Filtering removes extreme values or records that are problematic

– Consistency checks and verifying responses make sure participant intensions map to their responses


UX GOALS, METRICS, AND TARGETS

Hartson Chapter 10.


INTRODUCTION


Figure 10-1 You are here; the chapter on UX goals, metrics, and targets in the context of the overall Wheel lifecycle template.

UX GOALS

• Example: User Experience Goals for Ticket Kiosk System

– We can define the primary high-level UX goals for the ticket buyer to include:

• Fast and easy walk-up-and-use user experience, with absolutely no user training

• Fast learning so new user performance (after limited experience) is on par with that

of an experienced user [from AB-4-8]

• High customer satisfaction leading to high rate of repeat customers [from BC-6-16]

– Some other possibilities:

• High learnability for more advanced tasks [from BB-1-5]

• Draw, engagement, attraction

• Low error rate for completing transactions correctly, especially in the interaction

for payment [from CG-13-17]


UX TARGET TABLES


Table 10-1 Our UX target table, as evolved from the Whiteside, Bennett, and Holtzblatt (1988) usability specification table

WORK ROLES, USER CLASSES, AND UX GOALS


Ticket buyer: Casual new user, for occasional personal use

Walk-up ease of use for new user

Table 10-2 Choosing a work role, user class, and UX goal for a UX target

UX MEASURES

• Objective UX measures (directly measurable by evaluators)

– Initial performance

– Long-term performance (longitudinal, experienced, steady state)

– Learnability

– Retainability

– Advanced feature usage

• Subjective UX measures (based on user opinions)

– First impression (initial opinion, initial satisfaction)

– Long-term (longitudinal) user satisfaction


MEASURING INSTRUMENTS




Initial user performance


Initial customer satisfaction

First impression

Table 10-3 Choosing initial performance and first impression as UX measures


• Benchmark Tasks

– Address designer questions with benchmark tasks and UX targets

– Selecting benchmark tasks

• Create benchmark tasks for a representative spectrum of user tasks.

• Start with short and easy tasks and then increase difficulty progressively.

• Include some navigation where appropriate.

• Avoid large amounts of typing (unless typing skill is being evaluated).

• Match the benchmark task to the UX measure.

• Adapt scenarios already developed for design.

• Use tasks in realistic combinations to evaluate task flow.



• Do not forget to evaluate with your power users.

• To evaluate error recovery, a benchmark task can begin in an error state.

• Consider tasks to evaluate performance in “degraded modes” due to partial

equipment failure.

• Do not try to make a benchmark task for everything.

– Constructing benchmark task content

• Remove any ambiguities with clear, precise, specific, and repeatable instructions.

• Tell the user what task to do, but not how to do it.

• Do not use words in benchmark tasks that appear specifically in the interaction

design.



• Use work context and usage-centered wording, not system-oriented wording.

• Have clear start and end points for timing.

• Keep some mystery in it for the user.

• Annotate situations where evaluators must ensure pre-conditions for running

benchmark tasks.

• Use “rubrics” for special instructions to evaluators.

• Put each benchmark task on a separate sheet of paper.

• Write a “task script” for each benchmark task.







BT1: Buy special event ticket



First impression

Table 10-4 Choosing “buy special event ticket” benchmark task as measuring instrument for “initial performance” UX measure in first UX target










BT2: Buy movie ticket



First impression

Table 10-5 Choosing “buy movie ticket” benchmark task as measuring instrument for second initial performance UX measure


– How many benchmark tasks and UX targets do you need?

– Ensure ecological validity [Write your benchmark task descriptions, how

can the setting be made more realistic?]

• What are constraints in user or work context?

• Does the task involve more than one person or role?

• Does the task require a telephone or other physical props?

• Does the task involve background noise?

• Does the task involve interference or interruption?

• Does the user have to deal with multiple simultaneous inputs, for example,

multiple audio feeds through headsets?














First impression

Questions Q1–Q10 in the QUIS questionnaire

Table 10-6 Choosing questionnaire as measuring instrument for first-impression UX measure



Ease of first-time use Initial performance Time on task

Ease of learning Learnability Time on task or error rate, after given amount of use and compared with initial performance

High performance for experienced users

Long-term performance Time and error rates

Low error rates Error-related performance

Error rates

Error avoidance in safety critical tasks

Task-specific error performance

Error count,with strict target levels (much more important than time on task)

Error recovery performance

Task-specific time performance

Time on recovery portion of the task

Overall user satisfaction User satisfaction Average score on questionnaire

User attraction to product

User opinion of attractiveness

Average score on questionnaire, with questions focused on the effectiveness of the “draw” factor

Quality of user experience

User opinion of overall experience

Average score on questionnaire, with questions focused on quality of the overall user experience, including specific points about your product that might be associated most closely with emotional impact factors

Overall user satisfaction User satisfaction Average score on questionnaire, with questions focusing on willingness to be a repeat customer and to recommend product to others

Continuing ability of users to perform without relearning

Retainability Time on task and error rates re-evaluated after a period of time off (e.g., a week)

Avoid having user walk away in dissatisfaction

User satisfaction, especially initial Satisfaction

Average score on questionnaire, with questions focusing on initial impressions and satisfaction

Table 10-7 Close connections among UX goals, UX measures, and measuring instruments

UX METRICS






Average time on task





Average number of errors



First impression

Questions Q1–Q10 in the QUIS questionnaire

Average rating across users and across questions

Table 10-8 Choosing UX metrics for UX measures

SETTING LEVELS







3 minutes






<1



First impression

Questions Q1–Q10 in questionnaire XYZ


7.5/10

Table 10-9 Setting baseline levels for UX measures

SETTING LEVELS







3 min, as measured at the MUTTS ticket counter

2.5 min






<1 <1



First impression

Questions Q1–Q10 in questionnaire XYZ


7.5/10 8/10

Ticket buyer: Frequent music patron

Accuracy Experienced usage error rate

BT3: Buy concert ticket


<1

<1

Casual public ticket Buyer


Initial user Performance

BT4: Buy Monster Truck Pull tickets

Average time on Task

5 min (online system)

2.5 min

Casual public ticket buyer



BT4: Buy Monster Truck Pull tickets


< 1

<1

Casual public ticket buyer


First Impression

QUIS questions 4–7, 10, 13

Average rating across users and across Questions

6/10 8/10


Walk-up ease of use for user with a little experience

Just postinitial performance

BT5: Buy Almost Famous movie tickets


5 min (including review)

2 min


Walk-up ease of use for user with a little experience

Just postinitial performance

BT6: Buy Ben Harper concert tickets


<1

<1

Table 10-10 Setting target levels for UX metrics

PRACTICAL TIPS AND CAUTIONS FOR CREATING UX TARGETS

• Are user classes for each work role specified clearly enough?

– Have you taken into account potential trade-offs among user groups?

– Are the values for the various levels reasonable?

– Be prepared to adjust your target level values, based on initial observed

results

– Remember that the target level values are averages.

– How well do the UX measures capture the UX goals for the design?

– What if the design is in its early stages and you know the design will change

significantly in the next version, anyway?

– What about UX goals, metrics, and targets for usefulness and emotional

impact?


FROM DATA TO RESEARCH MODELS

Workshop #2


Choosing the Right Metrics Ten Types of Usability Studies

• Issue Based Metrics (Ch 5)

– Anything that prevents task completion

– Anything that takes someone off course

– Anything that creates some level of confusion

– Anything that produces an error

– Not seeing something that should be noticed

– Assuming something should be correct when it is not

– Assuming a task is complete when it is not

– Performing the wrong action

– Misinterpreting some piece of content

– Not understanding the navigation


Task Success

Task Time

Errors

Efficiency

Learnability

Issue Based Metrics

Self Reported Metrics

Behavioral and Physiological Metrics

Combined and Comparative Metrics

Live Website Metrics

Card Sorting Data


• Self Reported Metrics (Ch 6) : Asking participant for information about their

perception of the system and their interaction with it

– Overall interaction

– Ease of use

– Effectiveness of navigation

– Awareness of certain features

– Clarity of terminology

– Visual appeal

– Likert scales

– Semantic differential scales

– After-scenario questionnaire

– Expectation measures

– Usability Magnitude Estimation

– SUS

– CUSQ (Computer System Usability Scale)

– QUIS (Questionnaire for User Interface Satisfaction)

– WAMMI (Website Analysis & Measurement Inventory)

– Product Reaction Cards


Task Success

Task Time

Errors

Efficiency

Learnability

Issue Based Metrics





Card Sorting Data


• Behavioral and Physiological Metrics (Ch 7)

– Verbal Behaviors

• Strongly positive comment

• Strongly negative comment

• Suggestion for improvement

• Question

• Variation from expectation

• Stated confusion/frustration

– Nonverbal Behaviors

• Frowning/Grimacing/Unhappy

• Smiling/Laughing/Happy

• Surprised/Unexpected

• Furrowed brow/Concentration

• Evidence of impatience

• Leaning in close to screen

• Fidgeting in chair

• Rubbing head/eyes/neck


Task Success

Task Time

Errors

Efficiency

Learnability

Issue Based Metrics





Card Sorting Data


• Combined and Comparative Metrics (Ch

8)

– Taking smaller pieces of raw data like

task completion rates, time-on-task, self

reported ease of use to derive new

metrics such as an overall usability

metric or usability score card

– Comparing existing usability data to

expert or idea results


Task Success

Task Time

Errors

Efficiency

Learnability

Issue Based Metrics





Card Sorting Data


• Live Website Metrics (Ch 9)

– Information you can glean from live data

on a production website

• Server logs – page views and visits

• Click through rates - # times link shown vs.

actually clicked

• Drop off rates – abandoned process

• A/B studies – manipulate the pages users

see and compare metrics between them


Task Success

Task Time

Errors

Efficiency

Learnability

Issue Based Metrics





Card Sorting Data


• Card Sorting Data (Ch 9)

– Open card sort

• Give participants cards, they sort and

define groups

– Closed card sort

• Give participants cards and name of

groups, they put cards into groups


Task Success

Task Time

Errors

Efficiency

Learnability

Issue Based Metrics





Card Sorting Data


• Increasing Awareness

– Aimed at increasing awareness of a specific piece of content

or functionality

– Why is something not noticed or used?

• Metrics

– Live Website Metrics

• Monitor interactions

• Not foolproof – user may notice and decide not to click,

alternatively user may click but not notice interaction

• A/B testing to see how small changes impact user behavior

– Self Reported Metrics

• Pointing out specific elements to user and asking whether

they had noticed those elements during task

• Aware of feature before study began

– Not everyone has good memory

• Show users different elements and ask them to choose

which one they saw during task

– Behavioral and Physiological Metrics

• Eye tracking

– Determine amount of time looking at a certain element

– Average time spent looking at a certain element


Task Success

Task Time

Errors

Efficiency

Learnability

Issue Based Metrics





Card Sorting Data


• Problem Discovery

– Identify major usability issues

– After deployment, find out what annoys users

– Periodic checkup to see how users are interaction with

the product

• Discovery vs. usability study

– Open-ended

– Participants may generate own tasks

– Strive for realism in typical task and in user’s

environment

– Comparing across participants can be difficult

• Metrics

– Issue Based Metrics

• Capture all usability issues, you can convert into type

and frequency

• Assign severity rating and develop a quick-hit list of

design improvements



Task Success

Task Time

Errors

Efficiency

Learnability

Issue Based Metrics





Card Sorting Data


• Creating an Overall Positive User Experience

– Not enough to be usable, want exceptional user

experience

– Thought provoking, entertaining, slightly-addictive

– Performance useful, but what user thinks, feels, and

says really matters

• Metrics

– Self Reported

• Satisfaction – common but not enough

• Exceed expectations – want user to say it was easier,

more efficient, or more entertaining than expected

• Likelihood to purchase, use in future

• Recommend to a friend

• Behavioral and Physiological

– Pupil diameter

– Heart rate

– Skin conductance


Task Success

Task Time

Errors

Efficiency

Learnability

Issue Based Metrics





Card Sorting Data


• Comparing Designs

– Comparing more than one design alternative

– Early in the design process teams put together semi-

functional prototypes

– Evaluate using predefined set of metrics

• Participants

– Can’t ask same participant to perform same tasks with

all designs

– Even with counterbalancing design and task order,

information on valuable

• Procedure

– Study as between-subjects, participant only works with

one design

– Have primary design participant works with, show

alternative designs and ask for preference


Task Success

Task Time

Errors

Efficiency

Learnability

Issue Based Metrics





Card Sorting Data


• Comparing Designs (continued)

• Metrics

– Task Success

• Indicates which design more usable

• Small sample size, limited value

– Task Time

• Indicates which design more usable

• Small sample size, limited value

– Issue Based Metrics

• Compare the frequency of high-, medium-, and

lowseverity issues across designs to see which one

most usable


• Ask participant to choose the prototype they would

most like to use in the future (forced comparison)

• As participant to rate each prototype along

dimensions such as ease of use and visual appeal


Task Success

Task Time

Errors

Efficiency

Learnability

Issue Based Metrics





Card Sorting Data

Independent & Dependent Variables

Independent variables:

– The things you manipulate or contro

l for, e.g.,

– Aspect of a study that you manipula

te

– Chosen based on research question

– e.g.

• Characteristics of participants (e.g.,

age, sex, relevant experience)

• Different designs or prototypes bei

ng tested

• Tasks

Dependent variables: – The things you measure

– Describes what happened as a result

of the study

– Something you measure as the result,

or as dependent on, how you manipul

ate the independent variables

– e.g.

• Task Success

• Task Time

• SUS score

• etc.


Need to have a clear idea of what you plan to manipulate and what you plan to measure

Designing a Usability Study

RQ 1

• Research Question :

– Differences in performance be

tween males and females

• Independent variable

– : Gender

• Dependent variable

– : Task completion time

RQ 2

• Research Question :

– Differences in satisfaction bet

ween novice and expert users

• Independent variable :

– Experience level

• Dependent variable :

– Satisfaction


Types of Data

• Nominal (aka Categorical)

– e.g., Male, Female; Design A, Design B.

• Ordinal

– e.g., Rank ordering of 4 designs tested from Most Visually Appealing to

Least Visually Appealing.

• Interval

– e.g., 7-point scale of agreement: “This design is visually appealing.

Strongly Disagree . . . Strongly Agree”

• Ratio

– e.g., Time, Task Success %


NORMINAL DATA

• Definition

– Unordered groups or categories

– Without order, cannot say one is better than another

• May provide characteristics of users, independent variables that allow you to segment

data

– Windows versus Mac users

– Geographical location

– Males versus females

• What about dependent variables?

– Number of users who clicked on A vs. B

– Task success

• Usage

– Counts and frequencies


ORDINAL DATA

• Definition

– Ordered groups and categories

– Data is ordered in a certain way but intervals between measurements are not

meaningful

• Ordinal data comes from self-reported data on questionnaires

– Website rated as excellent, good, fair, or poor

– Severity rating of problem encountered as high, medium, or low

• Usage

– Looking at frequencies

– Calculating average is meaningless (distance between high and medium may

not be the same as medium and low)


INTERVAL DATA

• Definition

– Continous data where differences between the measurements are meaningful

– Zero point on the scale is arbitrary

• System Usability Scale (SUS)

– Example of interval data

– Based on self-reported data from a series of questions about overall usability

– Scores range from 0 to 100

• Higher score indicates better usability

• Distance between points meaningful because it indicates increase/decrease in percieved

usability

• Usage

– Able to calculate descriptive statistics such as average, standard deviation, etc.

– Inferal statistics can be used to generalize a population


Ordinal vs. Interval Rating Scales

• Are these two scales different?

• Top scale is ordinal. You should only calculate frequencies of each

response.

• Bottom scale can be considered interval. You can also calculate

means.


RATIO DATA

• Definition

– Same as interval data with the addition of absolute zero

– Zero has inherit meaning

• Example

– Difference between a person of 35 and a person 38 is the same as the

difference between people who are 12 and 15

– Time to completion, you can say that one participant is twice as fast as

another

• Usage

– Most analysis that you do work with ratio and interval data

– Geometric mean is an exception, need ratio data


Statistics for each Data Type


Confidence Intervals

• Assume this was your time data for a study with 5 participants:


Does that make a difference in your answer?

Calculating Confidence Intervals

– <alpha> is normally .05 (for

a 95% confidence interval)

– <std dev> is the standard d

eviation of the set of numbe

rs (9.6 in this example)

– <n> is how many numbers

are in the set (5 in this exa

mple)


=CONFIDENCE(<alpha>,<std dev>,<n>)

Excel Example

http://www.measuringux.com/Time-ConfidenceInterval.xls

Show Error Bars


Excel Example

http://www.measuringux.com/SUS-Apollo.xls

How to Show Error Bar


Binary Success

• Pass/fail (or other binary criteria)

• 1’s (success) and 0’s (failure)


Confidence Interval for Task Success

• When you look at task success data across participants for a single

task the data is commonly binary:

– Each participant either passed or failed on the task.

• In this situation, you need to calculate the confidence interval using

the binomial distribution.


Example

– Easiest way to calculate confidence interval is using Jeff Sauro’s

web calculator:

– http://www.measuringusability.com/wald.htm


1=success, 0=failure. So, 6/8 succeeded, or 75%.

http://www.measuringusability.com/wald.htm

Chi-square

• Allows you to compare actual and expected frequencies for

categorical data.


=CHITEST(<actual range>,<expected range>)

Excel Example

http://www.measuringux.com/Chi-square-ClickRates.xls

Comparing Means

T-test

• Independent samples (betw

een subjects)

– Apollo websites, task times

T-test

• Paired samples (within subje

cts)

– Haptic mouse study


http://www.measuringux.com/Apollo-TaskTimes.xls

http://www.measuringux.com/HapticMouse-Paired-Ttest.xls

T-tests in Excel

Independent Samples: Paired Samples:


=TTEST(<array1>,<array2>,x,y)

x = 2 (for two-tailed test) in almost all cases

y = 2 (independent samples) y = 1 (paired samples)

Comparing Multiple Means

• Analysis of Variance (ANOVA)


“Tools” > “Data Analysis” > “Anova: Single Factor” Excel example: Study comparing 4 navigation approaches for a website

http://www.measuringux.com/NavStudy-ANOVA.xls

http://www.measuringux.com/NavStudy-ANOVA.xls

Exercise 10-2: Creating Benchmark Tasks and UX Targets for Your System

• Goal

– To gain experience in writing effective benchmark tasks and measurable UX targets.

• Activities

– We have shown you a rather complete set of examples of benchmark tasks and UX targets for the Ticket Kiosk

System. Your job is to do something similar for the system of your choice.

– Begin by identifying which work roles and user classes you are targeting in evaluation (brief description is

enough).

– Write three or more UX table entries (rows), including your choices for each column. Have at least two UX

targets based on a benchmark task and at least one based on a questionnaire.

– Create and write up a set of about three benchmark tasks to go with the UX targets in the table.

• Do NOT make the tasks too easy.

• Make tasks increasingly complex.

• Include some navigation.

• Create tasks that you can later “implement” in your low-fidelity rapid prototype.

• The expected average performance time for each task should be no more than about 3 minutes, just to keep it

short and simple for you during evaluation.

– Include the questionnaire question numbers in the measuring instrument column of the appropriate UX target.


Exercise 10-2: Creating Benchmark Tasks and UX Targets for Your System

• Cautions and hints:

– Do not spend any time on design in this exercise; there will be time for detailed design in

the next exercise.

– Do not plan to give users any training.

• Deliverables:

– Two user benchmark tasks, each on a separate sheet of paper.

– Three or more UX targets entered into a blank UX target table on your laptop or on paper.

– If you are doing this exercise in a classroom environment, finish up by reading your

benchmark tasks to the class for critique and discussion.

• Schedule

– Work efficiently and complete in about an hour and a half.


[hci lab] week 5 ux goals and metrics

Education