an eye-tracking study of user interactions with query auto completion – katja hofmann, microsoft...

31
Katja Hofmann An Eye-tracking Study of User Interactions with Query Auto Completion Joint work with Bhaskar Mitra, Milad Shokouhi, and Filip Radlinski @katjahofmann

Upload: yandex

Post on 15-Jan-2015

634 views

Category:

Internet


0 download

DESCRIPTION

Query Auto Completion (QAC) suggests possible queries to web search users from the moment they start entering a query. This popular feature of web search engines is thought to reduce physical and cognitive effort when formulating a query. Perhaps surprisingly, despite QAC being widely used, users' interactions with it are poorly understood. This paper begins to address this gap. We present the results of an in-depth user study of user interactions with QAC in web search. While study participants completed web search tasks, we recorded their interactions using eye-tracking and client-side logging. This allows us to provide a first look at how users interact with QAC. We specifically focus on the effects of QAC ranking, by controlling the quality of the ranking in a within-subject design. We identify a strong position bias, that is consistent across ranking conditions. Due to this strong position bias, ranking quality affects QAC usage. We also find an effect on task completion, in particular on the number of result pages visited. We show how these effects can be explained by a combination of searchers' behavior patterns, namely monitoring or ignoring QAC, and searching for spelling support or complete queries to express a search intent. We conclude the paper with a discussion of the important implications of our findings for QAC evaluation.

TRANSCRIPT

Page 1: An Eye-tracking Study of User Interactions with Query Auto Completion – Katja Hofmann, Microsoft Research Cambridge

Katja Hofmann

An Eye-tracking Study of User Interactions with Query Auto Completion

Joint work with Bhaskar Mitra, Milad Shokouhi, and Filip Radlinski@katjahofmann

Page 2: An Eye-tracking Study of User Interactions with Query Auto Completion – Katja Hofmann, Microsoft Research Cambridge
Page 3: An Eye-tracking Study of User Interactions with Query Auto Completion – Katja Hofmann, Microsoft Research Cambridge

How can we train and evaluate contextual QAC?

Example: context-dependent queries [Shokouhi ‘13].

sacu salt lake tribune

Page 4: An Eye-tracking Study of User Interactions with Query Auto Completion – Katja Hofmann, Microsoft Research Cambridge

How do searchers examine and interact with QAC?

Click distributions for QAC on PC and iPhone [Li et al. ‘14].

prefix length

suggest

ion

rank

Click distributions for QAC over ranks [Mitra et al. ‘14].

clic

k r

ati

o

suggestion rank

From log data:

PC iPhone

Can infer examination from data + model (given modelling assumptions)

Model for inferring QAC examination from observed clicks [Kharitonov et al. ‘13].

Page 5: An Eye-tracking Study of User Interactions with Query Auto Completion – Katja Hofmann, Microsoft Research Cambridge

Goal of this study

Conduct controlled experiments to understand:

How do searchers examine QAC rankings?

How does the quality of QAC rankings affect examination and usage?

Are QAC examination and usage affected by position bias?

Page 6: An Eye-tracking Study of User Interactions with Query Auto Completion – Katja Hofmann, Microsoft Research Cambridge

OutlineExperimentAnalysisResultsDiscussion

Page 7: An Eye-tracking Study of User Interactions with Query Auto Completion – Katja Hofmann, Microsoft Research Cambridge

Focus on QAC rankingMain question: how does ranking quality affect examination and interaction.

two experimental conditions

massachu|

massachusetts

massachusetts state lottery

massachusetts unemployment

massachusetts registry of motor vehicles

massachusetts secretary of state

massachusetts department of revenue

massachusetts department of education

massachusetts general hospital

massachu|

massachusetts unemployment

massachusetts department of education

massachusetts secretary of state

massachusetts registry of motor vehicles

massachusetts

massachusetts general hospital

massachusetts department of revenue

massachusetts state lottery

original condition (production) random condition

Counterbalanced in blocks so maximum of 2 subsequent tasks are in the same condition.

Page 8: An Eye-tracking Study of User Interactions with Query Auto Completion – Katja Hofmann, Microsoft Research Cambridge

Search tasksDesigned 14 tasks (+2 practice tasks)Same tasks for all participants (counterbalanced order), required to control variance.

Included navigational and closed informational tasks (easy and complex).

Included difficult-to-spell names (schwarzenegger), terms that can be abbreviated (wsj).

Example search tasks:

Find the homepage of the Massachusetts General Hospital in Boston, USA.What is their physical address?(navigational)

Japan is the 10th most populated country in the world. How many people live there?(easy informational)

How many matches did Roger Federer win against Rafael Nadal in 2007?(complex informational)

Page 9: An Eye-tracking Study of User Interactions with Query Auto Completion – Katja Hofmann, Microsoft Research Cambridge

Eye-trackingMinimize user impact, maximize accuracy

Tobii TX300unobtrusivetracks natural head movement300 Hz temporal resolutionaccuracy up to 0.4˚ visual angle

size of each QAC suggestion on screen: 0.67˚

23’’ monitor

integrated eye-tracker

http://www.tobii.com/Global/Analysis/Downloads/Product_Descriptions/Tobii_TX300_EyeTracker_Product_Description.pdf

Page 10: An Eye-tracking Study of User Interactions with Query Auto Completion – Katja Hofmann, Microsoft Research Cambridge

Studying natural query formulation?

Make searchers type: Provide instructions and search task descriptions on screen (avoid copy-paste).

Participants: 25, diverse backgrounds, level of education, and computer experience.

Instruction: Participate in a study of search quality; start search from bing.com, then search any way you like.

Page 11: An Eye-tracking Study of User Interactions with Query Auto Completion – Katja Hofmann, Microsoft Research Cambridge

OutlineExperimentAnalysisResultsDiscussion

Page 12: An Eye-tracking Study of User Interactions with Query Auto Completion – Katja Hofmann, Microsoft Research Cambridge

DataCollected

eye fixations + saccades (on QAC and other parts of the screen)

mouse clicks, keystrokes

visited URLs

screen capture videos

browser events

Processed excluded 19 episodes where users did not search using bing

result: 331 valid search episodes

extracted 10 measurements to characterize QAC examination, query formulation, and task completion

Page 13: An Eye-tracking Study of User Interactions with Query Auto Completion – Katja Hofmann, Microsoft Research Cambridge

Video

Page 14: An Eye-tracking Study of User Interactions with Query Auto Completion – Katja Hofmann, Microsoft Research Cambridge

Measurements

Q1 Q2 Q3R1 R4S2 R5S4

R2

S3

task completion time (TCT)

time to first result click (TFC)

T E T S T _ Q U E

S1

query formulation time (QFT)time to first fixation

(TFF) A B

A + B = cumulative fixation time (CFT)

R3

fixation (anywhere on the screen)saccade (anywhere on the screen)

mouse click

typed character

QAC suggestions shown

fixations on QAC suggestions

control characters

QU QAC suggestion used

QR QAC rank

QL query length

CS characters saved

UQ unique queries submitted

UR unique result pages

+ query and task characteristics:

Page 15: An Eye-tracking Study of User Interactions with Query Auto Completion – Katja Hofmann, Microsoft Research Cambridge

Analysis using mixed effects modelModel random effects of participant and task, and fixed effect of condition on each response variable:

𝑔 ( 𝑦 𝑖𝑗 )=𝛽0+𝛽1𝑥𝑖𝑗+𝑝𝑖𝑢𝑖+𝑡 𝑗𝑣 𝑗+𝜀𝑖𝑗

link function (e.g. logit for binary response)

response for participant i and task j

condition effect (base level)

condition effect (random)

condition indicator

effect of participant

participant indicator

task indicator

effect of task

residual noise

Page 16: An Eye-tracking Study of User Interactions with Query Auto Completion – Katja Hofmann, Microsoft Research Cambridge

OutlineExperimentAnalysisResultsDiscussion

Page 17: An Eye-tracking Study of User Interactions with Query Auto Completion – Katja Hofmann, Microsoft Research Cambridge

Analysis: QAC examination

Q1 Q2 Q3R1 R4S2 R5S4

R2

S3

T E T S T _ Q U E

S1

time to first fixation (TFF) A B

A + B = cumulative fixation time (CFT)

R3

fixations on QAC suggestions

response type n β0 estimate β1 estimate

CFT > 0 binary

CFT | CFT > 0

log

TFF | CFT > 0

log

* marks coefficients that are estimated to differ significantly from zero.

Page 18: An Eye-tracking Study of User Interactions with Query Auto Completion – Katja Hofmann, Microsoft Research Cambridge

Analysis: QAC examination

Q1 Q2 Q3R1 R4S2 R5S4

R2

S3

T E T S T _ Q U E

S1

time to first fixation (TFF) A B

A + B = cumulative fixation time (CFT)

R3

fixations on QAC suggestions

response type n β0 estimate β1 estimate

CFT > 0 binary 331 3.468* 0.97 -0.220 0.96

CFT | CFT > 0

log 284 7.124* 1241 ms -0.043 1189 ms

TFF | CFT > 0

log 284 6.503* 667 ms -0.094 607 ms

* marks coefficients that are estimated to differ significantly from zero.

Page 19: An Eye-tracking Study of User Interactions with Query Auto Completion – Katja Hofmann, Microsoft Research Cambridge

Differences between conditions are much smaller than differences between ranks.

Fixations and use of AS by rank and condition. Condition has little effect, suggesting a strong position bias.

AS suggestion rank

AS

usa

ge

(perc

en

t)

Fixations (original)Fixations (random)AS usage (original)AS usage (random)

mean

fixati

on

tim

e

(mill

iseco

nd

s)

Page 20: An Eye-tracking Study of User Interactions with Query Auto Completion – Katja Hofmann, Microsoft Research Cambridge

Analysis: query formulation

Q1 Q2 Q3R1 R4S2 R5S4

R2

S3

T E T S T _ Q U E

S1

query formulation time (QFT)

R3

mouse click

typed character

control characters

response type n β0 estimate β1 estimate

QFT log

QL Poisson

QU binary

CS | QU Poisson

QR | QU Poisson* marks coefficients that are estimated to differ significantly from zero.

Page 21: An Eye-tracking Study of User Interactions with Query Auto Completion – Katja Hofmann, Microsoft Research Cambridge

Analysis: query formulation

Q1 Q2 Q3R1 R4S2 R5S4

R2

S3

T E T S T _ Q U E

S1

query formulation time (QFT)

R3

mouse click

typed character

control characters

response type n β0 estimate β1 estimate

QFT log 331 8.680* 5884 ms 0.058 6235 ms

QL Poisson

331 3.224* 25 -0.007 25

QU binary 331 -0.915* 0.29 -0.508 0.19

CS | QU Poisson

99 2.192* 9 0.223* 11

QR | QU Poisson

99 0.344* 1.4 0.044 1.5* marks coefficients that are estimated to differ significantly from zero.

Page 22: An Eye-tracking Study of User Interactions with Query Auto Completion – Katja Hofmann, Microsoft Research Cambridge

Analysis: task completion

Q1 Q2 Q3R1 R4S2 R5S4

R2

S3

task completion time (TCT)

time to first result click (TFC)

T E T S T _ Q U E

S1 R3

mouse click

response type n β0 estimate β1 estimate

UQ Poisson

UR = 0 binary

UR | UR > 0 Poisson

TFC | UR > 0

log

TCT ≥ ts binary

TCT | TCT < ts

log* marks coefficients that are estimated to differ significantly from zero.

Page 23: An Eye-tracking Study of User Interactions with Query Auto Completion – Katja Hofmann, Microsoft Research Cambridge

Analysis: task completion

Q1 Q2 Q3R1 R4S2 R5S4

R2

S3

task completion time (TCT)

time to first result click (TFC)

T E T S T _ Q U E

S1 R3

mouse click

response type n β0 estimate β1 estimate

UQ Poisson

331 0.357* 1.4 0.044 1.5

UR = 0 binary 331 -3.654* 0.03 -0.022 0.02

UR | UR > 0 Poisson

282 0.703* 2.0 0.161* 2.4

TFC | UR > 0

log 282 8.625* 5569 ms -0.036 5372 ms

TCT ≥ ts binary 331 -3.217* 0.04 0.764 0.08

TCT | TCT < ts

log 297 11.096* 65.9 s -0.021 64.5 s* marks coefficients that are estimated to differ significantly from zero.

Page 24: An Eye-tracking Study of User Interactions with Query Auto Completion – Katja Hofmann, Microsoft Research Cambridge

OutlineExperimentAnalysisResultsDiscussion

Page 25: An Eye-tracking Study of User Interactions with Query Auto Completion – Katja Hofmann, Microsoft Research Cambridge

How do users interact with AS?

a) touch typing, aware of suggestions

Page 26: An Eye-tracking Study of User Interactions with Query Auto Completion – Katja Hofmann, Microsoft Research Cambridge

How do users interact with AS?

b + c) spelling support vs. expressing an information need

Page 27: An Eye-tracking Study of User Interactions with Query Auto Completion – Katja Hofmann, Microsoft Research Cambridge

How do users interact with AS?

d) seeking suggestions

Page 28: An Eye-tracking Study of User Interactions with Query Auto Completion – Katja Hofmann, Microsoft Research Cambridge

DiscussionHow to measure QAC ranking quality?

Rank-based (e.g., MRR, extracted from logs)

e.g., [Shokouhi ‘13]

QAC usage [Kharitonov et al. ‘13]

Manual judgment of suggestions [Bhatia et al. ‘11]

Result page quality [Liu et al. ‘12]

Effort-based (e.g., MKS) [Duan & Hsu ‘11]

AB-tests [Kohavi et al. ‘13]

Interleaving [Hofmann et al. ‘13]

Page 29: An Eye-tracking Study of User Interactions with Query Auto Completion – Katja Hofmann, Microsoft Research Cambridge

Summary

To learn from user interactions, we need to understand how to interpret them.

Here: focus on effects of ranking changes on user interactions with AS.

Found evidence of strong position bias (no differences in examination / positional AS use), but strong effect on query effectiveness (e.g., # unique pages).

Next: incorporate findings into metrics for evaluation and learning, e.g., can we detect examinations from typing behavior?

Page 30: An Eye-tracking Study of User Interactions with Query Auto Completion – Katja Hofmann, Microsoft Research Cambridge

References

[Bhatia et al. ‘11] S. Bhatia, D. Majumdar, P. Mitra: Query suggestions in the absence of query logs (SIGIR 2011).

[Duan & Hsu ‘11] H. Duan, B.-J. P. Hsu: Online spelling correction for query completion (WWW ‘11).

[Hofmann et al. ‘13] K. Hofmann, S. Whiteson, M. de Rijke: Fidelity, soundness, and efficiency of interleaved comparison methods (ACM TOIS 31(4) 2013).

[Hofmann et al. ‘14] K. Hofmann, B. Mitra, M. Shokouhi, F. Radlinski: An Eye-tracking Study of User Interactions with Query Auto Completion (CIKM 2014).

[Kharitonov et al. 13] E. Kharitonov, C. Macdonald, P. Serdyukov, I. Ounis: User Model-based Metrics for Offline Query Suggestion Evaluation (CIKM 2013).

[Kohavi et al. ‘13] R. Kohavi, A. Deng, B. Frasca, T. Walker, Y. Xu, N. Pohlmann: Online controlled experiments at large scale (KDD 2013).

[Li et al. ‘14] Y. Li, A. Dong, H. Wang, H. Deng, Y. Chang, C. Zhai: A Two-Dimensional Click Model for Query Auto-Completion (SIGIR 2014).

[Liu et al. ‘12] Y. Liu, R. Song, Y. Chen, J.-Y. Nie, J.-R. Wen: Adaptive query suggestion for difficult queries (SIGIR 2012).

[Mitra et al. ‘14] B. Mitra, M. Shokouhi, F. Radlinski, K. Hofmann: On User’s Interactions with Query Auto-Completion (SIGIR 2014).

[Shokouhi ‘13] M. Shokouhi: Learning to Personalize Query Auto-Completion (SIGIR 2013).

Page 31: An Eye-tracking Study of User Interactions with Query Auto Completion – Katja Hofmann, Microsoft Research Cambridge

© 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.