utility of human-computer interactions: toward a science of preference measurement michael toomim,...
TRANSCRIPT
Utility of Human-Computer Interactions: Toward a Science of Preference Measurement
Michael Toomim, Travis KripleanClaus Pörtner and James A. Landay
University of Washington, dub Group CHI 2011
Discretionary Use of Interfaces• CHI research community grew from discretionary use of computer interfaces
(starting from 1980s), meaning free choices (i.e., people choose which interfaces to use to accomplish their tasks)
• Now, task (and its goal) is a choice (e.g., blogs, web browsing, SNS, Wikipedia), ubiquitous applications (e.g., smartphones, Nike+iPod)
• Widely accepted evaluation metrics in CHI research:– Indirect prediction about whether an interface will be preferred over other
alternatives– Examples: time-on-task, # of errors, subjective interpretations of think-aloud, survey
reports
Evaluating “User Choices”• Industry: A/B testing (split testing, bucket testing)
– Method of marketing testing by which multiple versions of one element are tested against a metric to define which is more successful
– These versions undergo testing simultaneously to determine which is better – Conversions are measured from the different sets of users (between-subjects)
• Yet, A/B testing is challenging: large up-front investment and large existing user-base to deploy/test (say, thousands of people)
vs.
Sample size matters
Control (baseline)
Treatment A
Treatment B Statistical significance test (e.g., t-test or chi-square)
Measuring User’s Preference
• Proposal: a semi-automated approach– Post thousands of “interface test tasks” to M-Turk – Observe how workers choose to complete the
tasks (and how many times they do so)– Analyze the data to measure the preference• How?
Example: Fitts’ law test• Fittsʼ law models the time required to click a widget of a size
and width—this technique can model how much people prefer to use a widget
Width Distance
Difficulty = f(width, distance)
Each time they clicked on the bar, it moved to the opposite side of the screen
Bar moves
Click!
For a given job, subjects are asked to click on a blue rectangle 60 times
Example: Fitts’ law test
Participants were assigned one of three index of difficulty conditions. Each point is the number of clicks a participant completed before quitting (points jittered to show spread)
Participants preferred big buttons to small buttons (p < 0.10)
Participants were allowed a maximum of 3,060 clicks each
The regression line accounts for this maximum using a Tobit analysis
Utility• Utility in Economics:
– The degree to which a person prefers a particular choice among options available• When a user chooses to use system A instead of B, it’s said that Utility(A) >
Utility(B)• Use economic utility to quantify aggregate user preference
– Example: If a user has no preference between (1) being paid $0.25 for using system A, and (2) being paid $0.50 for using system B
– Money-metric of utility: |Utility(A) – Utility(B)| = $0.25
Measuring Utility
• Utility = f(task, interface, context)– A user finds values in completing a task, but takes
some actions with a computer through some interface
– And the user’s context matters (e.g., demographics, social, moral status, etc.)
• Preference measurement begins with determining how much you must pay people to convince them to use an interface for a task
Measuring Utility
• Reservation wage: the wage below which a worker will not take a task
• Present a worker with a job at a price and observe their behavior: the worker will either complete a task at a given price or not
• Gather/analyze all the data: (Interface ID, Worker ID, Wage, Number of Completions)
Measuring Utility• Posting all scenarios/conditions simultaneously to M-Turk• Handling selection bias via a mystery task with “??? price”• Setting a limit on sub-tasks that a single worker can complete
(e.g., 50)• Handling market price fluctuations (as people likes to take
high paying tasks)
Fitts’ Law Study
Subjects clicked on a blue rectangle 60 times
Each time they clicked on thebar, it moved to the opposite
side of the screen
WidthDistance
Difficulty = f(width, distance)
Fitts’ Law Study Price range: $0.01-$0.06Difficulty: easy, medium, hardEach task: 60 clicks Upper limit of # tasks: 515 hours 15 minutes, $970
Aesthetics: CAPTCHAs
Aesthetics: CAPTCHAs• Survival graph shows how many workers made it through how
many tasks, for each of the four experimental conditions • Pretty and ugly lines are separated at the left, but converge
toward the right– This suggests either that the utility effect of aesthetics fades over
time, or that the types of users who complete many CAPTCHAs are more concerned with pay than aesthetics.
The shaded regions are 95% confidence intervals