field studies

Post on 25-Feb-2016

64 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Field studies. User studies. Ubicomp: people use technology Must conduct user studies Also: Focus groups Ethnographic studies Heuristic evaulations Etc. User studies. Laboratory studies: Controlled environment Field (in-situ) studies Real world. Field studies. - PowerPoint PPT Presentation

TRANSCRIPT

FIELD STUDIES

User studies Ubicomp: people use technology Must conduct user studies Also:

Focus groups Ethnographic studies Heuristic evaulations Etc.

User studies Laboratory studies:

Controlled environment Field (in-situ) studies

Real world

Field studies Appropriate for ubicomp:

Abundant data Observe unexpected challenges Understand impact on lives

Trade-off: Loss of control Significant time and effort

Three common types Current behavior Proof of concept Experience with prototype

How to think about user studies?

Formulate hypotheses

Research steps1. State problem(s)2. State goal(s)3. Propose hypotheses4. Propose steps to test hypotheses5. Explain how problem(s), goal(s) and

hypotheses fit into existing knowledge6. Produce results of testing hypotheses7. Explain results8. Evaluate research9. State new problems

What is a hypothesis? Proposing an explanation Theory or hypothesis? “This is just a theory.” Some theories we live by (“just” not

justified): Newton’s theory of motion Einstein’s theory of relativity Evolutionary theory

Hypothesis Must be tentative Must predict

Hypothesis Some criteria of scientificity

Self-consistent Grounded (fits bulk of relevant knowledge) Accounts for empirical evidence Empirically testable by objective

procedures of science General in some respect and to some

extent

On proposing hypotheses Anomalous phenomena:

Strange and unfamiliar (Bermuda triangle) Familiar yet not fully understood (cognitive

load) Is there already an explanation?

Types of hypotheses Incremental Fundamental shift:

Ptolemy (c. 90 – c. 168): geocentric cosmology

Copernicus (1473 – 1543): heliocentric cosmology

And then came… Kepler (1571 – 1630): elliptical orbits

Fundamental shift example Ulcer:

Stress? Spicy food? Bacteria.

Types of proposed explanations Causes Correlation Causal mechanisms Underlying processes Laws Functions

Proposing causal explanations Studies show that using a cell phone

while driving increases the probability of getting into an accident. Why is that so? Pick up ringing phone Dial number See but don’t perceive

Effects not always there Cell phone + driving:

Usually no accident Only one of the factors

Remote and proximate causes Cell phone + driving:

Attention shift → missed signal → accident Remote cause → proximate cause → effect

Correlation A and B are correlated if:

A → B B → A C → A and C → B A combination of (some of) the above Coincidence

Correlation vs. causal relation: Correlation doesn’t imply causal relation Cannot determine cause direction (A → B or B

→ A)

Correlation Positive, negative None found ≠ none exists Causal link → correlation:

May provide initial evidence for causal link Less explanatory value than facts about

causal links

Causal mechanisms Mechanisms connecting remote causes

and their effects. E.g.:

Damaged artery in heart → clotting Clotting → blocked artery Blocked artery → heart attack Aspirin inhibits clotting → lower risk of heart

attack

Underlying processes Photoelectric effect

Photoelectric effect Einstein: 1921 Nobel Prize in Physics

Laws General regularities in nature Universal:

F = ma Non-universal:

Statistical laws

Functions What is the purpose of the

phenomenon?FOR SALE A prime lot of serfs or SLAVES GYPSY (TZIGANY) Through an auction at noon at the St. Elias Monastery on 8 May 1852 consisting of 18 Men 10 Boys, 7 Women & 3 Girls in fine condition

Functions William Harvey (1578 – 1657):

Heart pumps blood through circulatory system

No modern instruments! Experiments with a number

of animals: Various fish, Snail, Pigeon, etc.

Multiple methods together

Function → → causal mechanism → → underlying processes

National Ignition Facility(Dennis O’Brien @ UNH):Ignition with lasers → → Laser, target chamber → → Physics of nuclear fusion

Multiple methods together Law → underlying processes Isaac Newton (1643 – 1727),

second law of motion:F = ma → Graviton?

Ockham’s razor Crop circles: pranksters or aliens?

Ockham’s razor William of Ockham (c. 1288 – c. 1348)

http://en.wikipedia.org/wiki/File:William_of_Ockham.png

Do I have a hypothesis? Yes. Do you realize you do?

How to think about user studies?

Formulate hypotheses

Three common types Current behavior Proof of concept Experience with prototype

Research steps1. State problem(s)2. State goal(s)3. Propose hypotheses4. Propose steps to test hypotheses5. Explain how problem(s), goal(s) and

hypotheses fit into existing knowledge6. Produce results of testing hypotheses7. Explain results8. Evaluate research9. State new problems

Current behavior Insights and inspiration:

State problem(s), goal(s) Propose hypotheses

Relatively long

Current behavior – example 1

AJ Brush and Kori Inkpen, “Yours, mine and ours?...” (pdf) (2005 movie inspiring title)

Home technology: users share, etc.

Current behavior – example 2 Schwetak Patel et al. “Farther Than You

May Think…” (pdf) Hypothesis: Mobile phone a proxy to

user location.

Three common types Current behavior Proof of concept Experience with prototype

Research steps1. State problem(s)2. State goal(s)3. Propose hypotheses4. Propose steps to test hypotheses5. Explain how problem(s), goal(s) and

hypotheses fit into existing knowledge6. Produce results of testing hypotheses7. Explain results8. Evaluate research9. State new problems

Proof of concept Technological advance:

Produce results: prototype Explain results: prototype

Relatively short

Proof of concept – example 1 J. Sherwani et al., “Speech vs. Touch-

tone: Telephone Interfaces for Information Access by Low Literate Users” (pdf) (video)

Hypothesis: Speech better telephony interface than touch-tone for low literate users.

Proof of concept – example 2

John Krumm and Eric Horvitz, “Predestination:…” (pdf)

Hypothesis: Destinations from partial trajectories.

Train/test algorithm on GPS tracks from 169 people

Used pre-existing data: Krumm and Horvitz, “The Microsoft Multiperson

Location Survey” Collecting original data a significant contribution Leverage!

Three common types Current behavior Proof of concept Experience with prototype

Research steps1. State problem(s)2. State goal(s)3. Propose hypotheses4. Propose steps to test hypotheses5. Explain how problem(s), goal(s) and

hypotheses fit into existing knowledge6. Produce results of testing hypotheses7. Explain results8. Evaluate research9. State new problems

Experience with prototype Users’ interaction with technology:

Produce results: prototype Explain results: prototype

Relatively long

Prototype an example! Others don’t care about:

Raw usage information Usability problems Intricate implementation details Etc.

Generalize! Scientific and good technical work

Experience – example 1 C. Neustaedter, et al., “A Digital Family

Calendar in the Home:…” (pdf) (video) Hypothesis: At-a-glance awareness,

remote access are significant benefits. 4 households, 4 weeks each (Best Student Paper, Graphics Interface

2007)

Experience – example 2 Rafael Ballagas et al., “Gaming Tourism:

…” (pdf) (video) Hypothesis: Learning through a game. 18 participants: 2 alone + 8 pairs (8 x 2

= 16)

Study design Who is the consumer?

Manager(s) Industry, academic lab

Professor(s) E.g. thesis committee

Researchers E.g. advisor’s collaborators

Reviewers For paper, proposal, thesis

Funding agency Report on progress, proposal for funding

Public Friends, family, alumni, potential students, donors,

potential employers

Study design How can I explain this to a layperson?

What is key? What can be omitted? How will I write this up?

Paper Thesis Report Blog post

Start writing paper/thesis/report/blog post at the beginning of the study.

Study design Test hypothesis/hypotheses

Testing hypotheses via user studies

User studies: Laboratory studies

Good: Control, easier to evaluate results Bad: Constraints

Field studies Good: Fewer constraints Bad: Less control, more difficult to evaluate

results

Criteria Falsifiability:

Prediction fails = explanation isn’t correct Account for other factors!

Note: Criterion - singular Criteria - plural

Criteria Verifiability:

Prediction successful = explanation is correct

Account for other factors!

The meat of it… Battleship Potemkin

, 1925 film Rotten meat scene

Why larvae in meat? Francesco Redi

(1626-1697) Generation of

insects, 1668 Causal

explanation: fly droppings

Redi’s research Hypothesis:

Worms derived from fly droppings Testing hypothesis:

Two sets of flasks with meat: sealed and open

Prediction: worms only in open flask

Falsifiability criterion Can anything cause a failed prediction

even if explanation is correct? Did the apparatus operate properly?

Tight seal? Meat not initially spoiled? Other?

Verifiability criterion Can anything result in successful

prediction even if explanation is wrong? What if “active principle” in the air is

responsible for spontaneous generation? Modify experiment:

Replace seal with veil: Flies cannot reach meat Air in contact with meat

Modification helps meet verifiability criterion

Verifiability criterion Experimental vs. control group:

Only difference in level of one independent variable

Redi’s experiment: Control: Open flasks Experimental: Veil-covered flasks

Control: laboratory experiment Meat in veil-covered flasks? Creating control/experimental groups

often impossible without careful design/control

Study design Test hypothesis/hypotheses Formulate in terms of:

Independent variables (multiple conditions) Dependent variables

Design: Within-subjects Between-subjects Mixed design

Within-subjects design: example

Police radio UI: hardware Speech

Blog post, video

Within-subjects design: example

Results in graphical form:

Within-subjects design: example

Results in graphical form:

Example: between-subjects design

Classical example: testing a drug

Mixed design: example 1 SUI characteristics study Secondary task: speech control of radio 2 x 2 x 2 design:

SR accuracy: high/low PTT button: yes/no – ambient recognition Dialog repair strategy: mis-/non-

understanding

Mixed design: example 2 Motivation: PTT vs. driving performance Secondary task: speech control of radio 2 x 3 x 3 design:

SR accuracy: high/low PTT activation:

push-hold-release/push-release/no push PTT button: ambient/fixed/glove

Push-hold-release Push-release No-push

Ambient Fixed Glove Ambient Fixed Glove Ambient Fixed Glove

High

Low

Control condition Baseline: e.g. no technology vs. later

introduced technology

Considerations What will subjects do?

Normal behavior – may take long Scenarios

Augment existing or brand new? Augment: taking advantage of familiarity New: more control (fewer inherited

constraints) Simulate or implement?

E.g. WoZ

Data to collect Qualitative

Insight into what participants did. How do participants compare? Did they do

what they thought they did? Use quantitative data.

Quantitative How did people behave? But why? Use qualitative data.

Data to collect At least three types of data:

Demographic Usage Reactions

Data to collect Run pilot experiments!

Collecting data Logging Surveys Experience sampling Diaries Interviews Unstructured observation – ethnography

Logging Plan ahead, not after the fact!

Testing hypotheses Don’t leave important data out Don’t save data you don’t need

Leverage logging: Everything OK?

E.g. Mike Farrar’s MS research: files appearing on server indicates apps OK

Explicit communication with server: “I’m OK!”

Surveys Open-ended Multiple-choice Likert-scale

Surveys Questions should allow positive and

negative feedback. Text clear to others?

Check! One question at a time!

“Fun and easy to use?” Length?

Don’t bore subjects to death. Standard questions (e.g. QUIS)?

Previously used questions?

Example: Mike Farrar’s study Hypotheses:

Initialize grammar (video): From previous tags From tags by users with similar interests

Voice commands convenient way to tag photos (video)

Keyboard users will use voice less Low task completion: give up on voice

Experience sampling (ESM) Short questionnaire Timing:

Random Scheduled Event-based

Experience sampling (ESM) How often? How many? Relate to quantitative data?

Diaries Similar to ESM

Interviews Semi-structured:

List of specific questions + follow-up questions

Bring data E.g. Nancy A. Van House: “

Flickr and Public Image Sharing:…” Interviews + photo elicitation

Interviews Neutral questions Negative feedback is OK (this is hard):

Don’t argue!

Participants Follow IRB rules

Participants Who to recruit?

Representative of intended users Not your friends, family, colleagues – bias! May need different types

Recruit sufficient numbers of each type

Participant profile Age

E.g. age significant for driving Gender Technology use and experience Other

Eye tracker studies: no glasses

Number of participants Between-subjects usually requires more

than within-subjects Proof-of-concept: typically fewer and

many types Longer study: may be able to use fewer Time commitment per participant is

significant! Recruit (Craigslist), organize, train, run,

transfer data, process data Participants will drop out – recruit extra

Counterbalancing may not work out

Compensation Don’t try to save on this! Driving simulator lab study cost example

1 graduate student year at UNH ≈ $50k Software maintenance fees per year ≈

$20k Trip to conference ≈ $2k PC or laptop ≈ $2k $20 x 24 participants ≈ $0.5k (less than

1%)

Compensation Must not affect data

E.g. in image tagging study if we paid per picture: More data Unrealistic as interactions are for money not for

value of prototype

Compensation Leverage if you can:

Latest driving simulator lab study in collaboration with Microsoft Research: Use Microsoft software as compensation

Data analysis Test hypotheses Use multiple data types Tell a story

Data analysis Statistics:

Descriptive Inferential

Descriptive statistics Level of measurement:

Nominal Ordinal Interval

Descriptive statistics Level of measurement:

Nominal Ordinal Interval

Level of measurement Nominal:

Unordered categories E.g. yes/no Valid to report :

Frequency

Level of measurement Ordinal:

Rank order preference without numeric difference

E.g. responses on Likert scale Five of the eight participants strongly agreed or

agreed with the following statement: “I prefer to have a GPS screen for navigation.”

Valid to report : Frequency Median Some people report means but what is the mean

of “strongly agree” and “strongly disagree”?

Level of measurement Interval:

Numerical differences significant E.g. age, number of times an action

occurred, etc. Valid to report:

Sum Mean Median Standard deviation (outliers?)

Outliers in interval data

Inferential statistics Significance tests

t-test ANOVA Many others

Which to use: depends on data

Significance test: example 1 To assess the effect of different

navigation aids on visual attention, we performed a one-way ANOVA using PTD as the dependent variable. As expected, the time spent looking at the outside world was significantly higher when using spoken directions as compared to the standard PND directions, p<.01. Specifically, for spoken directions only, the average PDT was 96.9%, while it was 90.4% for the standard PND.

Significance test: example 2

-5

0

5

10

15

20

60-80 80-100 100-120 120-140 140-160

PDT

on st

anda

rd P

ND

[%]

distance from previous intersection [m]

… PDT on the PND screen changes with the distance from the previous intersection… significant main effect, p<.01…

Significance test: example 3 Randomization test

Kun et al. (pdf) Idea from Veit et al. (pdf)

Significance test: example 3

0

5

10

15

20

25

30

35

0 1 2 3 4 5 6 7 8

Rstw

[deg

rees

^2 ]

lag [seconds]

standard

p = 0.05

spoken only

top related