experiments - stanford university

78
Scott Klemmer and Michael Bernstein EXPERIMENTS

Upload: others

Post on 16-Oct-2021

18 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EXPERIMENTS - Stanford University

Scott Klemmer and Michael Bernstein

EXPERIMENTS

Page 2: EXPERIMENTS - Stanford University

“Do You Like My Interface?”

���2

Page 3: EXPERIMENTS - Stanford University

“How much do you like my interface?”

“This is a useful interface: agree/disagree”

���3

Page 4: EXPERIMENTS - Stanford University

Please the Experimenter Bias

���4

Page 5: EXPERIMENTS - Stanford University

However: Watching people fail to use your design is one of the most effective learning tools anywhere.

���5

Page 6: EXPERIMENTS - Stanford University

Today: measuring success offline and online

�Variables, validity �Randomization �Usability testing �Online A/B testing

Page 7: EXPERIMENTS - Stanford University

Getting beyond “do you like my interface?”�What’s the comparison? �What’s the yardstick?

���7

Page 8: EXPERIMENTS - Stanford University

Getting beyond “do you like my interface?”�Baserates: How often does Y occur? �Requires measuring Y.

���8

Page 9: EXPERIMENTS - Stanford University

Getting beyond “do you like my interface?”�Baserates: How often does Y occur? �Requires measuring Y.

!

�Correlations: Do X and Y co-vary? �Requires measuring X and Y.

���9

Page 10: EXPERIMENTS - Stanford University

Getting beyond “do you like my interface?”�Baserates: How often does Y occur? �Requires measuring Y.

!

�Correlations: Do X and Y co-vary? �Requires measuring X and Y.

!

�Causes: Does X cause Y? �Requires measuring X and Y, and manipulating X. �Also requires somehow accounting for the effects of other independent variables (confounds)!

���10

Page 11: EXPERIMENTS - Stanford University

Let’s introduce a few terms...

���11

Page 12: EXPERIMENTS - Stanford University

Independent Variablesmanipulations

���12

Page 13: EXPERIMENTS - Stanford University

Dependent Variablesmeasures

���13

Page 14: EXPERIMENTS - Stanford University

Internal Validityprecision

���14

Page 15: EXPERIMENTS - Stanford University

External Validitygeneralizability

���15

Page 16: EXPERIMENTS - Stanford University

Source: PC World ���16

Page 17: EXPERIMENTS - Stanford University

Benefits and Drawbacks?�Manipulation: Input Style �Measure: Words per minute �External validity: not so much

���17

Page 18: EXPERIMENTS - Stanford University

A better version: actual users

���18An In-Depth Look into the Text Entry User Experience on the iPhone, Jennifer M. Allen, Leslie A. McFarlin and Thomas Green, Proceedings of the Human Factors and Ergonomics Society Annual Meeting 2008

�Manipulation: Input Style �Measure: Words per minute �...and error rate

Page 19: EXPERIMENTS - Stanford University

iPhone & Qwerty users similar speed, but

���19An In-Depth Look into the Text Entry User Experience on the iPhone, Jennifer M. Allen, Leslie A. McFarlin and Thomas Green, Proceedings of the Human Factors and Ergonomics Society Annual Meeting 2008

Page 20: EXPERIMENTS - Stanford University

Strategies for fairer comparisons�Insert your new approach into the production setting �Recreate the production approach in your new setting �Scale things down so you’re just looking at a piece of a larger system �When expertise is relevant, train people up

���20

Page 21: EXPERIMENTS - Stanford University

1) Experimental Design

Page 22: EXPERIMENTS - Stanford University

Controlled comparison enables causal inference.

���22

Between vs. within subjects designs Randomization

Page 23: EXPERIMENTS - Stanford University

Should every participant use every alternative?

���23

Page 24: EXPERIMENTS - Stanford University

Which professor styleis more effective?

���24

Page 25: EXPERIMENTS - Stanford University

What are the measures?�Faster? �Better test scores? �Fatigue? �Attention focused on lecture? �End-of-class evaluations?

���25

Page 26: EXPERIMENTS - Stanford University

�Manipulation: Professor style �Measure: Test scores, end-of-class evaluations

���26

Page 27: EXPERIMENTS - Stanford University

���27

The other half see the otherHalf the participants see one version

Between subjects design

Page 28: EXPERIMENTS - Stanford University

���28

Within subjects design!

Everyone uses both versions

Page 29: EXPERIMENTS - Stanford University

How Can We Address Ordering Effects?

���29

Page 30: EXPERIMENTS - Stanford University

How about individual differences?

���30

Randomization washes them out.

Page 31: EXPERIMENTS - Stanford University

What about for Three or More Alternatives?

���31

Page 32: EXPERIMENTS - Stanford University

Latin Square

1 2 3

2 3 1

3 1 2

���32

Page 33: EXPERIMENTS - Stanford University

What happens if we don’t randomize?

���33

Page 34: EXPERIMENTS - Stanford University

Self-selection effects

�Typing in the morning versus the afternoon

���34

Page 35: EXPERIMENTS - Stanford University

Learning effects�Showing alternatives in sequence

���35

Page 36: EXPERIMENTS - Stanford University

At a high level: Should every participant use every alternative?

���36

Page 37: EXPERIMENTS - Stanford University

Two Major Strategies�Within-subjects: everyone tries all the options. Good when you’re not worried about learning/practice/exposure issues (that trying one version will ‘pollute’ the date from another version) �Between-participants: each person tries one. Requires more people, and more attention to fair assignment. Has the benefit that each participant is uncorrupted (at least by the study...)

���37

Page 38: EXPERIMENTS - Stanford University

2) In-person experiments

Page 39: EXPERIMENTS - Stanford University

Set a clear, focused goal• Scope: make a meeting room

booking system for Gates. • Purpose: create a system that

encourages people to not overbook the length of time they need

• Hypothesis: splitting the booking process over several screens will encourage more thought and people will book better

vs.

Page 40: EXPERIMENTS - Stanford University

Recruit to match that goal

• Schedule and location: next week, Stanford Gates building

• Participants: 12 people (4 students, 4 office administrators, 4 professors)

vs.

Page 41: EXPERIMENTS - Stanford University

Create concrete tasks

“Book a room sometime next week for a research group meeting. Andrew will be out of town, so we won’t hear his weekly update. The rest of us should be present and give our updates. Besides the usual group members, we’ll have two visitors from France who will present their research-- maybe they’ll take 10 minutes each. !

When you’re done booking the room, tell Arvind so he can prepare the next task for you.”

Page 42: EXPERIMENTS - Stanford University

“We are testing our design: we are not testing you.”

Page 43: EXPERIMENTS - Stanford University

Experimental Details

� Task ordering � Start simple, then build up to complex tasks

� Training: is this a walk-up-and-use system? Or will real users receive training?

� What if someone doesn’t finish?

Page 44: EXPERIMENTS - Stanford University

Always pilot your study before launching� A pilot study is a practice run before the experiment � Pilots = low-cost prototypes of experimental design � Even if you are running behind schedule, do a pilot study

�Why pilot? �Debug study protocols �Catch errors early so they don’t mess up your results

�Run two pilots � First: friends+colleagues �Second: real users

Page 45: EXPERIMENTS - Stanford University

Options for capturing results� Bring a notebook for freeform feedback � Instrumented software � Video recording � Screen recording

Page 46: EXPERIMENTS - Stanford University

Think Aloud protocol• Need to know what users are thinking, not just what they

are doing • Ask users to talk while performing tasks

• tell us what they are thinking • tell us what they are trying to do • tell us questions that arise as they work • tell us things they read

• Prompt the user to keep talking: • “Tell me what you are thinking”

• Make a recording or take good notes • make sure you can tell what they were doing

Page 47: EXPERIMENTS - Stanford University

3) Online field experiments

Page 48: EXPERIMENTS - Stanford University

Relative merits: online vs. in-person

Offline: deep understanding Online: scale and resolution

Page 49: EXPERIMENTS - Stanford University

Interface courtesy of A List Apart: http://www.alistapart.com/articles/designcancripple 49

Page 50: EXPERIMENTS - Stanford University

Interface courtesy of A List Apart: http://www.alistapart.com/articles/designcancripple 50

Page 51: EXPERIMENTS - Stanford University

Interface courtesy of A List Apart: http://www.alistapart.com/articles/designcancripple 51

Page 52: EXPERIMENTS - Stanford University

Interface courtesy of A List Apart: http://www.alistapart.com/articles/designcancripple 52

Page 53: EXPERIMENTS - Stanford University

Interface courtesy of A List Apart: http://www.alistapart.com/articles/designcancripple 53

Page 54: EXPERIMENTS - Stanford University

54

Page 55: EXPERIMENTS - Stanford University

55

Page 56: EXPERIMENTS - Stanford University

Ways design makes a difference• Position and color of a call to action • Position on the page of testimonials, if used • Whether linked elements are in text or as images • Amount of white space on a page, giving the content space

to “breathe” • Position and prominence of the main heading • Number of columns used on the page • Number of visual elements competing for attention • Attributes of people and objects in photos

Page 57: EXPERIMENTS - Stanford University

Courtesy Forrest Glick, Stanford STVP http://project8180.stanford.edu/category/wireframes/ 57

Page 58: EXPERIMENTS - Stanford University

Courtesy Forrest Glick, Stanford STVP http://project8180.stanford.edu/category/wireframes/ 58

Page 59: EXPERIMENTS - Stanford University

Results• Version A (traditional version) was sent to 6272 users.

Opened: 1638 - Click thrus: 722 - Forwards: 4 • Version B (Quick Shots version) was sent to 6263 users.

Opened: 1769 - Click thrus: 922 - Forwards: 14

Page 60: EXPERIMENTS - Stanford University

Courtesy Dan Siroker, Stanford HCI Seminar http://hci.stanford.edu/courses/cs547/speaker.php?date=2009-05-08 60

Page 61: EXPERIMENTS - Stanford University

Which option performed best?• Sign up • Learn More • Sign up Now • Join Us Now

Page 62: EXPERIMENTS - Stanford University

Courtesy Dan Siroker, Stanford HCI Seminar http://hci.stanford.edu/courses/cs547/speaker.php?date=2009-05-08

Now for the visual material: five options

Page 63: EXPERIMENTS - Stanford University

Courtesy Dan Siroker, Stanford HCI Seminar http://hci.stanford.edu/courses/cs547/speaker.php?date=2009-05-08 63

Page 64: EXPERIMENTS - Stanford University

Courtesy Dan Siroker, Stanford HCI Seminar http://hci.stanford.edu/courses/cs547/speaker.php?date=2009-05-08

Which do you think did best?• Family Image • Change Image • Barack Video • Springfield Video • Sam’s Video

Page 65: EXPERIMENTS - Stanford University

Courtesy Dan Siroker, Stanford HCI Seminar http://hci.stanford.edu/courses/cs547/speaker.php?date=2009-05-08 65

Page 66: EXPERIMENTS - Stanford University

Here We Saw• Small changes: big difference • Our expectations are often wrong

Page 67: EXPERIMENTS - Stanford University

Which got the most clickthroughs?• I’m on twitter • Follow me on twitter • You should follow me on twitter • You should follow me on twitter here

Page 68: EXPERIMENTS - Stanford University

4.70% I’m on twitter 7.31% Follow me on twitter 10.09% You should follow me on twitter 12.81% You should follow me on twitter here

Which got the most clickthroughs?

Page 69: EXPERIMENTS - Stanford University

Content courtesy Ron Kohavi

Typography Experiment: Color Contrast on MSN Live Search

Queries/User up 0.9%

Ad clicks/user up 3.1%

A: Softer colors B: High contrast

Page 70: EXPERIMENTS - Stanford University

Large scale changes design• Making small but consequential differences detectable. • Small differences accumulate • Watch out for spurious results: do you trust your

measurements?

Page 71: EXPERIMENTS - Stanford University

Unexpected changes in a checkout page• Which version produces more purchases?

Page 72: EXPERIMENTS - Stanford University

The cost of one extra data field

Page 73: EXPERIMENTS - Stanford University

Small distractions such as extra fields can yield big changes

Page 74: EXPERIMENTS - Stanford University

Small changes have positive impact, too

courtesy Greg Linden’s blog: http://glinden.blogspot.com/2006/04/early-amazon-shopping-cart.html

2x the response rate of Version A

Version A Version B

Page 75: EXPERIMENTS - Stanford University

Fewer options; custom response

3.5x the response rate of Version B!

Version C

Page 76: EXPERIMENTS - Stanford University

Why?• Commitment escalation: if they agree to do a little bit,

then add more later, they’re much more likely to do it than if you ask for everything up front

Page 77: EXPERIMENTS - Stanford University

The big picture: continuous iteration• Iterative design plus controlled experimentation are a

formidable combination for quickly improving your design

Page 78: EXPERIMENTS - Stanford University

Design in the online age• Designers role shifts to being about creating multiple

alternatives • People are often too sure of themselves • Rapid experimentation means the first release is

(sometimes) less important -- fail fast