interact 2009 katsanos et al are 10 participants enough to evaluate scent

Purpose & Motivation

2

A critical factor in web navigation is information scent (Fu & Pirolli, 2007; Blackmon et al, 2005; Miller & Remington, 2004)

user’s assessment of semantic relevance of navigation options in a webpage

Often, participants are called to evaluate scent by providing ratings (Miller & Remington, 2004; Brumby & Howes, 2008)

Remains unclear how many raters are required to obtain representative estimates of information scent.

The Study: First Phase

3

Design & Procedures

Web-based survey

Rate semantic relevancy of all links to the provided goal (1=poor relevance, 5=high relevance).

101 participants 8 navigation menus, 8 links each

4

6464 ratings

Analysis Methodology Reference case = Scent-ratings from 101 participants

Select 10 random samples of different size N N = 2, 5, 10, 15, 20, 25, 30, 40 and 50

[Samples-Ratings] VS [All 101 participants Ratings] Average Spearman Correlation

How many raters are enough to represent the ratings of the whole dataset? 5

Results

6

10 raters 84-90% total var.

Error Bars = (rMEAN ± rSD)2

x2 raters still the same

x3 raters +5% closer to whole

dataset

First-phase: Conclusion

10 raters appear to be a cost-effective solution to evaluate information scent without expense in the quality of results

7

But how close are scent-ratings of 10 participants to observed navigation behavior?

The Study: Second Phase

8

Design & Procedures

Eye-tracking user study

Perform the same 8 navigation tasks used in first-phase

54 users (not involved in first-phase)

Two measures of users’ behavior: clicks on each link fixations-adjusted-for-text-length on each link.

9

432 recordings

Analysis Methodology

Reference case = Behavioral data from 54 users

[Scent-ratings from samples of - 1st phase] VS [Measures of user’s navigation behavior - 2nd phase]

Average Spearman Correlation

How many raters are enough to reach an acceptable level of correlation with these two measures?

10

Results

11

Clicks on each link r10-raters is 0.7% different from

r101-raters

r101raters = 0.80, p<.01

Fixations on each link r10-raters is 7.4% different

from r101-raters

r101-raters = 0.40, ns

Error Bars = rMEAN ± rSD

Second-phase: Conclusion

10 participants provide scent-ratings that are close to observed link-selection behavior (clicks) distribution of attention (fixations)

12

However, scent-ratings should be used only as a rough indicator of users’ distribution of attention rs = 0.40, ns

Summary & Questions

Investigated the well-known debate of “how many users” in the context of information scent evaluation

Scent-ratings of 10 participants appeared to be enough for a discount evaluation of information scent

13

More studies required in the context of highly specialized domains and/or varied user group composition

Christos Katsanos | [email protected]

EXTRA SLIDES

14

First-Phase: Question example

15

Second-Phase: How many users are enough?

16

Clinks Count Observations Count

interact 2009 katsanos et al are 10 participants enough to evaluate scent

Education