marilyn hughes blackmon, u. of colorado muneo kitajima, aist, japan peter polson, u. of colorado

28
Blackmon, Kitajima, & Polson, CHI2005 1/26 Tool for Accurately Predicting Website Navigation Problems, Non- Problems, Problem Severity, and Effectiveness of Repairs Marilyn Hughes Blackmon, U. of Colorado Muneo Kitajima, AIST, Japan Peter Polson, U. of Colorado

Upload: aletha

Post on 26-Jan-2016

39 views

Category:

Documents


0 download

DESCRIPTION

Tool for Accurately Predicting Website Navigation Problems, Non-Problems, Problem Severity, and Effectiveness of Repairs. Marilyn Hughes Blackmon, U. of Colorado Muneo Kitajima, AIST, Japan Peter Polson, U. of Colorado. Part One. Work supported by NSF Grant 01-37759 to M. H. Blackmon - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Marilyn Hughes Blackmon, U. of Colorado Muneo Kitajima, AIST, Japan Peter Polson, U. of Colorado

Blackmon, Kitajima, & Polson, CHI2005 1/26

Tool for Accurately Predicting Website Navigation Problems,

Non-Problems, Problem Severity, and Effectiveness of Repairs

Marilyn Hughes Blackmon, U. of ColoradoMuneo Kitajima, AIST, JapanPeter Polson, U. of Colorado

Page 2: Marilyn Hughes Blackmon, U. of Colorado Muneo Kitajima, AIST, Japan Peter Polson, U. of Colorado

Blackmon, Kitajima, & Polson, CHI2005 2/26

Part One

Work supported by NSF Grant 01-37759 to M. H. Blackmon

http://autocww.colorado.edu/~brownr/ACWW.phphttp://autocww.colorado.edu/~blackmonhttp://autocww.colorado.edu

Page 3: Marilyn Hughes Blackmon, U. of Colorado Muneo Kitajima, AIST, Japan Peter Polson, U. of Colorado

Blackmon, Kitajima, & Polson, CHI2005 3/26

Problem that spurred research and development of tool

Focus on users building comprehensive knowledge of a topic Browse complex websites (cf. search engine) Pure forward search Learn by exploration

Automatically predict what is worth repairing? Need accurate measure of problem severity Need to predict success rate for repairs

Web designers using tool must be able to do what unaided designers cannot: predict behavior of users different from themselves – objectively represent user diversity (background knowledge)

Page 4: Marilyn Hughes Blackmon, U. of Colorado Muneo Kitajima, AIST, Japan Peter Polson, U. of Colorado

Blackmon, Kitajima, & Polson, CHI2005 4/26

Solution: Incrementally extend Cognitive Walkthrough for the Web (CWW)

CHI2002 paper tailored Cognitive Walkthrough (CW) for web navigation

Proved CWW would identify usability problems that interfere with web navigation

Substituted objective measures of similarity, familiarity, and elaboration of heading/link texts using Latent Semantic Analysis (LSA)

CHI2003 paper proved significantly better performance on CWW-repaired webpages vs. original, unrepaired pages

Page 5: Marilyn Hughes Blackmon, U. of Colorado Muneo Kitajima, AIST, Japan Peter Polson, U. of Colorado

Blackmon, Kitajima, & Polson, CHI2005 5/26

Percent task failure correlated 0.93 with observed clicks (each task n≥38)

5

26

51

76

0

10

20

30

40

50

60

70

80

90

100

2.5 5 8 11

Mean observed clicks (each task n ≥38)

Percent task failure

Page 6: Marilyn Hughes Blackmon, U. of Colorado Muneo Kitajima, AIST, Japan Peter Polson, U. of Colorado

Blackmon, Kitajima, & Polson, CHI2005 6/26

Research problem, reformulated: What determines mean clicks?

Identify & repair factors that increase mean clicks and raise risk of task failure

Hypothetical determinants, based on prior results and theory underlying CWW research: Unfamiliar correct link, i.e., insufficient

background knowledge to comprehend link Competing headings & their high-scent links Competing links under correct heading Weak scent correct link under correct heading

Page 7: Marilyn Hughes Blackmon, U. of Colorado Muneo Kitajima, AIST, Japan Peter Polson, U. of Colorado

Blackmon, Kitajima, & Polson, CHI2005 7/26

First step: Collect enough data for multiple regression analysis Reused 64 tasks from CHI2003 paper

and ran additional experiments to get data on 100 new tasks, creating 164-task dataset

Developed automatable rules for CWW problem identification

Built multiple regression model for 164-task dataset and found 3 independent variables explaining 57% of the variance

Page 8: Marilyn Hughes Blackmon, U. of Colorado Muneo Kitajima, AIST, Japan Peter Polson, U. of Colorado

Blackmon, Kitajima, & Polson, CHI2005 8/26

Multiple regression translates into formula to predict problem severity

Multiple regression analysis yielded formula for predicting mean clicks on links: + 2.199 (predicted clicks for non-problem) + 1.656 if correct link is unfamiliar + 0.754 times number of competing links nested

under any competing heading + 1.464 if correct link has weak-scent + zero clicks for competing links under correct heading

Prediction for non-problem task = 2.199 ≥2.5 mean clicks distinguishes problem from non-

problem

Page 9: Marilyn Hughes Blackmon, U. of Colorado Muneo Kitajima, AIST, Japan Peter Polson, U. of Colorado

Blackmon, Kitajima, & Polson, CHI2005 9/26

Example of task: Find article about Hmong:

List of 9 categories >

Social Science >

Anthropology

Scroll A-Z list to find Hmong

Page 10: Marilyn Hughes Blackmon, U. of Colorado Muneo Kitajima, AIST, Japan Peter Polson, U. of Colorado

Blackmon, Kitajima, & Polson, CHI2005 10/26

Page 11: Marilyn Hughes Blackmon, U. of Colorado Muneo Kitajima, AIST, Japan Peter Polson, U. of Colorado

Blackmon, Kitajima, & Polson, CHI2005 11/26

CWW-identified problems in “Find Hmong” task: Competing headings

0.30

0.08

0.19

Page 12: Marilyn Hughes Blackmon, U. of Colorado Muneo Kitajima, AIST, Japan Peter Polson, U. of Colorado

Blackmon, Kitajima, & Polson, CHI2005 12/26

Predicted mean clicks for Find Hmong task on original, unrepaired webpage

+ 2.199 -- predicted clicks for non-problem + 1.656 -- if correct link is unfamiliar + 1.464 -- if correct link has weak-scent + 3.770 -- (0.754 *5, the number of competing

links nested under any competing heading)_________

9.089 -- predicted mean total clicks

Page 13: Marilyn Hughes Blackmon, U. of Colorado Muneo Kitajima, AIST, Japan Peter Polson, U. of Colorado

Blackmon, Kitajima, & Polson, CHI2005 13/26

CWW-guided repairs of navigation usability problems detected by CWW

Create alternate high-scent paths to target webpage via all correct and competing headings IF competing heading(s) IF unfamiliar correct link IF weak-scent correct link

Substitute or elaborate link text with familiar, higher frequency words IF unfamiliar correct link

Page 14: Marilyn Hughes Blackmon, U. of Colorado Muneo Kitajima, AIST, Japan Peter Polson, U. of Colorado

Blackmon, Kitajima, & Polson, CHI2005 14/26

Repair benefits for “Find Hmong,” a problem definitely worth repairing

Performancemeasure

Original webpage

Repairedweb page

First-clicksuccess rate 3% 43%Predicted vs.actual meantotal clicks***

9.1 vs. 9.0 2.2 vs. 2.1

Failure rate 74% 0%

Solution time 124 seconds 41 seconds

***Significant difference, F (1,73) = 98.9,p<.0001

Page 15: Marilyn Hughes Blackmon, U. of Colorado Muneo Kitajima, AIST, Japan Peter Polson, U. of Colorado

Blackmon, Kitajima, & Polson, CHI2005 15/26

All 164 tasks: Predicted vs. observed mean total clicks

2.20

5.32

2.16

5.61

0

1

2

3

4

5

6

7

Non-Problem Problem

Predicted Problem Difficulty

Mean Total CLicks

PredictedObserved

2.2

3.8

6.17

2.17

3.52

6.43

0

1

2

3

4

5

6

7

Non-Problem (1.0–2.5clicks)

Moderate Problem(2.5–5.0 clicks)

Serious Problem (≥5.0clicks)

Predicted Problem Difficulty

Mean Total Clicks

Predicted

Observed

Page 16: Marilyn Hughes Blackmon, U. of Colorado Muneo Kitajima, AIST, Japan Peter Polson, U. of Colorado

Blackmon, Kitajima, & Polson, CHI2005 16/26

Psychological validity measures for 164-task dataset

For 46 tasks predicted to have serious problems (i.e., predicted clicks ≥ 5.0) 100% hit rate, 0% false alarms 93% success rate for repairs (statistically

significant difference repaired vs. not)

For all 75 tasks predicted to be problems 92% hit rate, 8% false alarms 83% success rate for repairs, significant

different repaired vs. unrepaired, p<.0001

Page 17: Marilyn Hughes Blackmon, U. of Colorado Muneo Kitajima, AIST, Japan Peter Polson, U. of Colorado

Blackmon, Kitajima, & Polson, CHI2005 17/26

Cross-validation study: Replicate the model on new dataset? Ran another large experiment to test

whether multiple regression formula replicated with new set of tasks 2 groups Each group did 32 new tasks, 64 total tasks Used prediction formula to identify problems

vs. non-problems All tasks have just one correct link

Page 18: Marilyn Hughes Blackmon, U. of Colorado Muneo Kitajima, AIST, Japan Peter Polson, U. of Colorado

Blackmon, Kitajima, & Polson, CHI2005 18/26

Multiple regression analysis produced full cross validation

Multiple regression of 64-task dataset gave same 3 determinants found for 164-task original dataset & similar coefficients

Hit rate for predicted problems = 90%, false alarms = 10%

Correct rejection for predicted non-problems = 69%, 31% misses, but 2/3 of misses had observed clicks 2.5-3.5, other 1/3 of misses >3.5 but <5.0

Page 19: Marilyn Hughes Blackmon, U. of Colorado Muneo Kitajima, AIST, Japan Peter Polson, U. of Colorado

Blackmon, Kitajima, & Polson, CHI2005 19/26

Predicted vs. observed clicks for 64 tasks in cross-validation experiment

2.20

5.43

2.20

5.68

0

1

2

3

4

5

6

7

Non-Problem Problem

Predicted Problem, Yes or No

Mean Total CLicks

PredictedObserved

Page 20: Marilyn Hughes Blackmon, U. of Colorado Muneo Kitajima, AIST, Japan Peter Polson, U. of Colorado

Blackmon, Kitajima, & Polson, CHI2005 20/26

Part Two

Page 21: Marilyn Hughes Blackmon, U. of Colorado Muneo Kitajima, AIST, Japan Peter Polson, U. of Colorado

Blackmon, Kitajima, & Polson, CHI2005 21/26

Theory matters: CWW is theory-based usability evaluation method

CoLiDeS cognitive model (Kitajima, Blackmon, & Polson, 2000, 2005)

Construction-Integration cognitive architecture (Kintsch, 1998), a comprehensive model of human cognitive processes

Latent Semantic Analysis (LSA)

Page 22: Marilyn Hughes Blackmon, U. of Colorado Muneo Kitajima, AIST, Japan Peter Polson, U. of Colorado

Blackmon, Kitajima, & Polson, CHI2005 22/26

The Key Idea Core process underlying Web navigation is

skilled reading comprehension Comprehension processes build mental

representations of goals and webpage objects (subregions, hyperlinks, images, and other targets for action)

Action planning compares goal with potential targets for action and selects target with highest activation level

Page 23: Marilyn Hughes Blackmon, U. of Colorado Muneo Kitajima, AIST, Japan Peter Polson, U. of Colorado

Blackmon, Kitajima, & Polson, CHI2005 23/26

Consensus: Web navigation is equivalent to following scent trail Scent or residue (Furnas,

1997) SNIF-ACT based on

Information Foraging (Pirolli & Card, 1999)

Bloodhound Project: Web User Flow by Information Scent (WUFIS) => InfoScent Simulator (Chi, et al., 2001, 2003)

CWW activation level

Page 24: Marilyn Hughes Blackmon, U. of Colorado Muneo Kitajima, AIST, Japan Peter Polson, U. of Colorado

Blackmon, Kitajima, & Polson, CHI2005 24/26

CoLiDeS activation level: Scent is MORE than just similarity

Adequate background knowledge to comprehend headings and links? Select semantic space that best matches user group Warning bell for low word frequency Warning bell for low term vector

Before computing similarity, simulate human elaboration of link texts during comprehension, using LSA Near neighbors, finding terms simultaneously familiar and similar in meaning

Compute goal-heading and goal-link similarity with LSA cosines, defining weak scent as a cosine <0.10, moderate scent as cosine ≥0.30

Page 25: Marilyn Hughes Blackmon, U. of Colorado Muneo Kitajima, AIST, Japan Peter Polson, U. of Colorado

Blackmon, Kitajima, & Polson, CHI2005 25/26

Conclusions: Extending CWW successful for research and development of tool

We CAN now predict severity of navigation usability problems and success rate for repairs of these problems, so we invest time to repair only what is worth repairing: tasks predicted ≥5.0 clicks

Web designers using tool CAN do what unaided designers cannot: predict behavior of users different from themselves – objectively represent user diversity in education level, culture, language, and field of expertise (background knowledge)

Page 26: Marilyn Hughes Blackmon, U. of Colorado Muneo Kitajima, AIST, Japan Peter Polson, U. of Colorado

Blackmon, Kitajima, & Polson, CHI2005 26/26

Conclusions, continued Scales up to large websites Reliable (LSA measures vs. human

judgments) Psychologically valid (228-task dataset,

large n gives stable mean for each task), based on cognitive model

Theory matters Drives experimental design High accuracy and psychological validity of tool Practitioners and researchers can now put the

tool to use with trust

Page 27: Marilyn Hughes Blackmon, U. of Colorado Muneo Kitajima, AIST, Japan Peter Polson, U. of Colorado

Blackmon, Kitajima, & Polson, CHI2005 27/26

Page 28: Marilyn Hughes Blackmon, U. of Colorado Muneo Kitajima, AIST, Japan Peter Polson, U. of Colorado

Blackmon, Kitajima, & Polson, CHI2005 28/26

Non-problem task Find Fern approaches asymptote of pure forward search

One-click minimum path for both problems AND non-problems

1.1 mean total clicks on links 90% pure forward search (minimum

path solution) 97% of first clicks were on link under

correct heading 100% success rate -- everyone finished

task in 1 or 2 clicks 9 seconds = mean solution time