interact 2009 katsanos et al are 10 participants enough to evaluate scent

16
Christos Katsanos | [email protected] Nikolaos Tselios | [email protected] Nikolaos Avouris | [email protected] Are Ten Participants Enough for Evaluating Information Scent of Web Page Hyperlinks? IFIP INTERACT | Uppsala, Sweden | 24-28 August, 2009

Upload: nikolaos-tselios

Post on 03-Dec-2014

96 views

Category:

Education


3 download

DESCRIPTION

are 10 participants enough to evaluate scent

TRANSCRIPT

Page 1: Interact 2009 katsanos et al are 10 participants enough to evaluate scent

Christos Katsanos | [email protected]

Nikolaos Tselios | [email protected]

Nikolaos Avouris | [email protected]

Are Ten Participants Enough for Evaluating Information Scent of Web Page Hyperlinks?

IFIP INTERACT | Uppsala, Sweden | 24-28 August, 2009

Page 2: Interact 2009 katsanos et al are 10 participants enough to evaluate scent

Purpose & Motivation

2

A critical factor in web navigation is information scent (Fu & Pirolli, 2007; Blackmon et al, 2005; Miller & Remington, 2004)

user’s assessment of semantic relevance of navigation options in a webpage

Often, participants are called to evaluate scent by providing ratings (Miller & Remington, 2004; Brumby & Howes, 2008)

Remains unclear how many raters are required to obtain representative estimates of information scent.

Page 3: Interact 2009 katsanos et al are 10 participants enough to evaluate scent

The Study: First Phase

3

Page 4: Interact 2009 katsanos et al are 10 participants enough to evaluate scent

Design & Procedures

Web-based survey

Rate semantic relevancy of all links to the provided goal (1=poor relevance, 5=high relevance).

101 participants 8 navigation menus, 8 links each

4

6464 ratings

Page 5: Interact 2009 katsanos et al are 10 participants enough to evaluate scent

Analysis Methodology Reference case = Scent-ratings from 101 participants

Select 10 random samples of different size N N = 2, 5, 10, 15, 20, 25, 30, 40 and 50

[Samples-Ratings] VS [All 101 participants Ratings] Average Spearman Correlation

How many raters are enough to represent the ratings of the whole dataset? 5

Page 6: Interact 2009 katsanos et al are 10 participants enough to evaluate scent

Results

6

10 raters 84-90% total var.

Error Bars = (rMEAN ± rSD)2

x2 raters still the same

x3 raters +5% closer to whole

dataset

Page 7: Interact 2009 katsanos et al are 10 participants enough to evaluate scent

First-phase: Conclusion

10 raters appear to be a cost-effective solution to evaluate information scent without expense in the quality of results

7

But how close are scent-ratings of 10 participants to observed navigation behavior?

Page 8: Interact 2009 katsanos et al are 10 participants enough to evaluate scent

The Study: Second Phase

8

Page 9: Interact 2009 katsanos et al are 10 participants enough to evaluate scent

Design & Procedures

Eye-tracking user study

Perform the same 8 navigation tasks used in first-phase

54 users (not involved in first-phase)

Two measures of users’ behavior: clicks on each link fixations-adjusted-for-text-length on each link.

9

432 recordings

Page 10: Interact 2009 katsanos et al are 10 participants enough to evaluate scent

Analysis Methodology

Reference case = Behavioral data from 54 users

[Scent-ratings from samples of - 1st phase] VS [Measures of user’s navigation behavior - 2nd phase]

Average Spearman Correlation

How many raters are enough to reach an acceptable level of correlation with these two measures?

10

Page 11: Interact 2009 katsanos et al are 10 participants enough to evaluate scent

Results

11

Clicks on each link r10-raters is 0.7% different from

r101-raters

r101raters = 0.80, p<.01

Fixations on each link r10-raters is 7.4% different

from r101-raters

r101-raters = 0.40, ns

Error Bars = rMEAN ± rSD

Page 12: Interact 2009 katsanos et al are 10 participants enough to evaluate scent

Second-phase: Conclusion

10 participants provide scent-ratings that are close to observed link-selection behavior (clicks) distribution of attention (fixations)

12

However, scent-ratings should be used only as a rough indicator of users’ distribution of attention rs = 0.40, ns

Page 13: Interact 2009 katsanos et al are 10 participants enough to evaluate scent

Summary & Questions

Investigated the well-known debate of “how many users” in the context of information scent evaluation

Scent-ratings of 10 participants appeared to be enough for a discount evaluation of information scent

13

More studies required in the context of highly specialized domains and/or varied user group composition

Christos Katsanos | [email protected]

Page 14: Interact 2009 katsanos et al are 10 participants enough to evaluate scent

EXTRA SLIDES

14

Page 15: Interact 2009 katsanos et al are 10 participants enough to evaluate scent

First-Phase: Question example

15

Page 16: Interact 2009 katsanos et al are 10 participants enough to evaluate scent

Second-Phase: How many users are enough?

16

Clinks Count Observations Count