optimizing ranking models in an online setting

61
Optimizing Ranking Models in an Online Setting Harrie Oosterhuis and Maarten de Rijke April 16, 2019 University of Amsterdam [email protected], [email protected] https://staff.fnwi.uva.nl/h.r.oosterhuis

Upload: others

Post on 22-Jul-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Optimizing Ranking Models in an Online Setting

Optimizing Ranking Models in an Online Setting

Harrie Oosterhuis and Maarten de Rijke

April 16, 2019

University of Amsterdam

[email protected], [email protected]

https://staff.fnwi.uva.nl/h.r.oosterhuis

Page 2: Optimizing Ranking Models in an Online Setting

Introduction

Page 3: Optimizing Ranking Models in an Online Setting

Learning to Rank in Information Retrieval

Learning to rank enables the optimization of ranking systems.

Traditionally learning to rank uses annotated datasets:

• Relevance annotations for query-document pairs provided by human judges.

There are problems with annotated datasets:

• expensive to make (Qin and Liu, 2013; Chapelle and Chang, 2011).

• impossible to create (Wang et al., 2016).

• not necessarily aligned with actual user preferences (Sanderson, 2010),

i.e. annotators and users often disagree.

1

Page 4: Optimizing Ranking Models in an Online Setting

Learning to Rank in Information Retrieval

Learning to rank enables the optimization of ranking systems.

Traditionally learning to rank uses annotated datasets:

• Relevance annotations for query-document pairs provided by human judges.

There are problems with annotated datasets:

• expensive to make (Qin and Liu, 2013; Chapelle and Chang, 2011).

• impossible to create (Wang et al., 2016).

• not necessarily aligned with actual user preferences (Sanderson, 2010),

i.e. annotators and users often disagree.

1

Page 5: Optimizing Ranking Models in an Online Setting

Online Learning to Rank

Page 6: Optimizing Ranking Models in an Online Setting

Online Learning to Rank

Online learning to Rank: learn by interacting with users.

Solves most of the problems of annotations:

• Interactions are virtually free if you have users.

• User behaviour gives implicit feedback.

These methods have to handle:

• Noise: Users click for unexpected reasons.

• Biases: Interactions are affected by position and selection bias.

2

Page 7: Optimizing Ranking Models in an Online Setting

Online Learning to Rank

Online learning to Rank: learn by interacting with users.

Solves most of the problems of annotations:

• Interactions are virtually free if you have users.

• User behaviour gives implicit feedback.

These methods have to handle:

• Noise: Users click for unexpected reasons.

• Biases: Interactions are affected by position and selection bias.

2

Page 8: Optimizing Ranking Models in an Online Setting

Dueling Bandit Gradient Descent

Page 9: Optimizing Ranking Models in an Online Setting

Dueling Bandit Gradient Descent: Introduction

Introduced by Yue and Joachims (2009) as the first online learning to rank method.

Intuition:

• Interleaving can compare rankers from user interactions.

• By sampling model variants and comparing them with interleaving,

the gradient of a model w.r.t. user satisfaction can be estimated.

3

Page 10: Optimizing Ranking Models in an Online Setting

Dueling Bandit Gradient Descent: Visualization

User

Wei

ght #

1

Weight #2

4

Page 11: Optimizing Ranking Models in an Online Setting

Dueling Bandit Gradient Descent: Visualization

QueryUser

Wei

ght #

1

Weight #2

5

Page 12: Optimizing Ranking Models in an Online Setting

Dueling Bandit Gradient Descent: Visualization

QueryUser

Wei

ght #

1

Weight #2

6

Page 13: Optimizing Ranking Models in an Online Setting

Dueling Bandit Gradient Descent: Visualization

QueryUser

Wei

ght #

1

Weight #2

7

Page 14: Optimizing Ranking Models in an Online Setting

Dueling Bandit Gradient Descent: Visualization

Query

Document

Document

Document

Document

Ranking B

User

Wei

ght #

1

Weight #2

Document

Document

Document

Document

Ranking A

8

Page 15: Optimizing Ranking Models in an Online Setting

Dueling Bandit Gradient Descent: Visualization

Query

Document

Document

Document

Document

Ranking B

User

Wei

ght #

1

Weight #2

Document

Document

Document

Document

Ranking A

Document

Document

Document

Document

Displayed Results

Interleaving

9

Page 16: Optimizing Ranking Models in an Online Setting

Dueling Bandit Gradient Descent: Visualization

Query

Document

Document

Document

Document

Ranking B

User

Wei

ght #

1

Weight #2

Document

Document

Document

Document

Ranking A

Document

Document

Document

Document

Displayed Results

Seeing/Interacting

Interleaving

10

Page 17: Optimizing Ranking Models in an Online Setting

Dueling Bandit Gradient Descent: Visualization

Query

Document

Document

Document

Document

Ranking B

User

Wei

ght #

1

Weight #2

Document

Document

Document

Document

Ranking A

Document

Document

Document

Document

Displayed Results

Seeing/Interacting

Interleaving

Learning

11

Page 18: Optimizing Ranking Models in an Online Setting

Dueling Bandit Gradient Descent: Visualization

Query

Document

Document

Document

Document

Ranking B

User

Wei

ght #

1

Weight #2

Document

Document

Document

Document

Ranking A

Document

Document

Document

Document

Displayed Results

Seeing/Interacting

Interleaving

Learning

12

Page 19: Optimizing Ranking Models in an Online Setting

Dueling Bandit Gradient Descent: Properties

Basis of the online learning to rank field,

virtually all existing methods are extensions of this algorithm

(Schuth et al., 2016; Hofmann et al., 2013; Zhao and King, 2016; Wang et al., 2018a).

Problems with Dueling Bandit Gradient Descent:

• A considerable gap between DBGD and best possible performance.

• Ineffective at optimizing non-linear models.

Advantage of Dueling Bandit Gradient Descent:

• Has a theoretical foundation based on proven regret bounds.

13

Page 20: Optimizing Ranking Models in an Online Setting

Dueling Bandit Gradient Descent: Properties

Basis of the online learning to rank field,

virtually all existing methods are extensions of this algorithm

(Schuth et al., 2016; Hofmann et al., 2013; Zhao and King, 2016; Wang et al., 2018a).

Problems with Dueling Bandit Gradient Descent:

• A considerable gap between DBGD and best possible performance.

• Ineffective at optimizing non-linear models.

Advantage of Dueling Bandit Gradient Descent:

• Has a theoretical foundation based on proven regret bounds.

13

Page 21: Optimizing Ranking Models in an Online Setting

Pairwise Differentiable Gradient

Descent

Page 22: Optimizing Ranking Models in an Online Setting

Pairwise Differentiable Gradient Descent

We recently introduced Pairwise Differentiable Gradient Descent (Oosterhuis and

de Rijke, 2018).

Intuition:

• A pairwise approach can be made unbiased while being differentiable.

14

Page 23: Optimizing Ranking Models in an Online Setting

Plackett Luce Model

Pairwise Differentiable Gradient Descent optimizes a Plackett Luce ranking

model, this models a probabilistic distribution over documents.

With the ranking scoring model f(d) the distribution is:

P (d|f,D) =expf(d)∑

d′∈D expf(d′)(1)

Unlike DBGD, confidence is explicitly modelled.

15

Page 24: Optimizing Ranking Models in an Online Setting

Bias in Pairwise Inference

Similar to existing pairwise methods (Oosterhuis and de Rijke, 2017; Joachims, 2002),

Pairwise Differentiable Gradient Descent infers pairwise document preferences from

user clicks:

document 1

document 2

document 3

document 4

document 5

This inference is based on a cascading assumption.

PDGD weighs inferred preferences to account for position/selection bias.

16

Page 25: Optimizing Ranking Models in an Online Setting

Bias in Pairwise Inference

Similar to existing pairwise methods (Oosterhuis and de Rijke, 2017; Joachims, 2002),

Pairwise Differentiable Gradient Descent infers pairwise document preferences from

user clicks:

document 1

document 2

document 3

document 4

document 5

This inference is based on a cascading assumption.

PDGD weighs inferred preferences to account for position/selection bias.

16

Page 26: Optimizing Ranking Models in an Online Setting

Pairwise Differentiable Gradient Descent: Visualization

UserDocument

Document

Document

Document

Document

Document

Document

Document

Document

Document

Document

Document

Document Collection

17

Page 27: Optimizing Ranking Models in an Online Setting

Pairwise Differentiable Gradient Descent: Visualization

QueryUserDocument

Document

Document

Document

Document

Document

Document

Document

Document

Document

Document

Document

Document Collection

18

Page 28: Optimizing Ranking Models in an Online Setting

Pairwise Differentiable Gradient Descent: Visualization

QueryUserDocument

Document

Document

Document

Document

Document

Document

Document

Document

Document

Document

Document

Learned Distribution

19

Page 29: Optimizing Ranking Models in an Online Setting

Pairwise Differentiable Gradient Descent: Visualization

QueryUser

Document

Document

Document

Document

Displayed Results

Document

Document

Document

Document

Document

Document

Document

Document

Document

Document

Document

Document

Learned Distribution

Sampling Documents

20

Page 30: Optimizing Ranking Models in an Online Setting

Pairwise Differentiable Gradient Descent: Visualization

QueryUser

Document

Document

Document

Document

Displayed Results

Seeing/Interacting

Document

Document

Document

Document

Document

Document

Document

Document

Document

Document

Document

Document

Learned Distribution

Sampling Documents

21

Page 31: Optimizing Ranking Models in an Online Setting

Pairwise Differentiable Gradient Descent: Visualization

QueryUser

Document

Document

Document

Document

Displayed Results

Seeing/Interacting

InferringPair Preferences

Document

Document

Document

Document

Document

Document

Document

Document

Document

Document

Document

Document

Learned Distribution

Document

Document

Document

Document

Reversed Pair Rankings

Sampling Documents

22

Page 32: Optimizing Ranking Models in an Online Setting

Pairwise Differentiable Gradient Descent: Visualization

QueryUser

Document

Document

Document

Document

Displayed Results

Seeing/Interacting

InferringPair Preferences

Document

Document

Document

Document

Document

Document

Document

Document

Document

Document

Document

Document

Learned Distribution

Document

Document

Document

Document

Reversed Pair Rankings

UnbiasedUpdate

Sampling Documents

23

Page 33: Optimizing Ranking Models in an Online Setting

Pairwise Differentiable Gradient Descent: Properties

Pairwise Differentiable Gradient Descent does not rely on online evaluation or the

sampling of models.

Advantages claimed by previous work (Oosterhuis and de Rijke, 2018):

• Much better point of convergence i.e. long-term performance.

• Much faster learning i.e. short term performance.

• Computationally much faster.

24

Page 34: Optimizing Ranking Models in an Online Setting

Limitations of Existing Results

Page 35: Optimizing Ranking Models in an Online Setting

Limitations of Previous Existing Results

Comparison based on previous work:

• Pairwise Differentiable Gradient Descent outperforms DBGD.

• Dueling Bandit Gradient Descent has proven sublinear regret bounds.

Problems with the current state of affairs:

• Past comparisons are based on low-noise, cascading click models,

Pairwise Differentiable Gradient Descent assumes cascading behaviour!

• There is a conflict between the proven regret bounds of Dueling Bandit

Gradient Descent and its observed performance.

25

Page 36: Optimizing Ranking Models in an Online Setting

Limitations of Previous Existing Results

Comparison based on previous work:

• Pairwise Differentiable Gradient Descent outperforms DBGD.

• Dueling Bandit Gradient Descent has proven sublinear regret bounds.

Problems with the current state of affairs:

• Past comparisons are based on low-noise, cascading click models,

Pairwise Differentiable Gradient Descent assumes cascading behaviour!

• There is a conflict between the proven regret bounds of Dueling Bandit

Gradient Descent and its observed performance.

25

Page 37: Optimizing Ranking Models in an Online Setting

Limitations of Previous Existing Results

Comparison based on previous work:

• Pairwise Differentiable Gradient Descent outperforms DBGD.

• Dueling Bandit Gradient Descent has proven sublinear regret bounds.

Problems with the current state of affairs:

• Past comparisons are based on low-noise, cascading click models,

Pairwise Differentiable Gradient Descent assumes cascading behaviour!

• There is a conflict between the proven regret bounds of Dueling Bandit

Gradient Descent and its observed performance.

25

Page 38: Optimizing Ranking Models in an Online Setting

Limitations of Previous Existing Results

Comparison based on previous work:

• Pairwise Differentiable Gradient Descent outperforms DBGD.

• Dueling Bandit Gradient Descent has proven sublinear regret bounds.

Problems with the current state of affairs:

• Past comparisons are based on low-noise, cascading click models,

Pairwise Differentiable Gradient Descent assumes cascading behaviour!

• There is a conflict between the proven regret bounds of Dueling Bandit

Gradient Descent and its observed performance.

25

Page 39: Optimizing Ranking Models in an Online Setting

Our contribution

First, we critically look at the regret bounds of Dueling Bandit Gradient Descent:

• We prove that its assumptions cannot be true for standard ranking models.

Then we reproduce the comparison under different conditions:

• Both cascading and non-cascading click behaviour.

• Simulated conditions ranging from ideal to extremely difficult.

We simulate near-random behaviour.

26

Page 40: Optimizing Ranking Models in an Online Setting

Our contribution

First, we critically look at the regret bounds of Dueling Bandit Gradient Descent:

• We prove that its assumptions cannot be true for standard ranking models.

Then we reproduce the comparison under different conditions:

• Both cascading and non-cascading click behaviour.

• Simulated conditions ranging from ideal to extremely difficult.

We simulate near-random behaviour.

26

Page 41: Optimizing Ranking Models in an Online Setting

Experimental Results

Page 42: Optimizing Ranking Models in an Online Setting

Experimental Setup

Simulations based on largest available industry datasets:

• MSLR-Web10k, Yahoo Webscope, Istella.

User behaviour simulated using cascading and non-cascading click models.

Simulated behaviour ranging from:

• ideal: no noise, no position bias,

• extremely difficult: mostly noise, very high position bias.

Dueling Bandit Gradient Descent with an oracle instead of interleaving, to see the

maximum potential of better interleaving methods.

27

Page 43: Optimizing Ranking Models in an Online Setting

Experimental Setup

Simulations based on largest available industry datasets:

• MSLR-Web10k, Yahoo Webscope, Istella.

User behaviour simulated using cascading and non-cascading click models.

Simulated behaviour ranging from:

• ideal: no noise, no position bias,

• extremely difficult: mostly noise, very high position bias.

Dueling Bandit Gradient Descent with an oracle instead of interleaving, to see the

maximum potential of better interleaving methods.

27

Page 44: Optimizing Ranking Models in an Online Setting

Experimental Setup

Simulations based on largest available industry datasets:

• MSLR-Web10k, Yahoo Webscope, Istella.

User behaviour simulated using cascading and non-cascading click models.

Simulated behaviour ranging from:

• ideal: no noise, no position bias,

• extremely difficult: mostly noise, very high position bias.

Dueling Bandit Gradient Descent with an oracle instead of interleaving, to see the

maximum potential of better interleaving methods.

27

Page 45: Optimizing Ranking Models in an Online Setting

Comparison: Dueling Bandit Gradient Descent

0 2 · 105 4 · 105 6 · 105 8 · 105 106

iterations

0.30

0.35

0.40

0.45ND

CG

DBGD(perfect)

DBGD(casc.)

DBGD(non-casc.)

DBGD(oracle)

Results from simulations on the MSLR-WEB10k dataset.

28

Page 46: Optimizing Ranking Models in an Online Setting

Comparison: Pairwise Differentiable Gradient Descent

0 2 · 105 4 · 105 6 · 105 8 · 105 106

iterations

0.30

0.35

0.40

0.45ND

CG PDGD(non-casc.)

PDGD(perfect)

PDGD(casc.)

Results from simulations on the MSLR-WEB10k dataset.

29

Page 47: Optimizing Ranking Models in an Online Setting

Comparison: Complete

0 2 · 105 4 · 105 6 · 105 8 · 105 106

iterations

0.30

0.35

0.40

0.45ND

CGPDGD(non-casc.)

PDGD(perfect)

PDGD(casc.)

DBGD(perfect)

DBGD(casc.)

DBGD(non-casc.)

DBGD(oracle)

Results from simulations on the MSLR-WEB10k dataset.30

Page 48: Optimizing Ranking Models in an Online Setting

Comparison: Complete

0 2 · 105 4 · 105 6 · 105 8 · 105 106

iterations0.650

0.675

0.700

0.725

0.750ND

CGPDGD(non-casc.)

PDGD(perfect)

PDGD(casc.)

DBGD(perfect)

DBGD(casc.)

DBGD(non-casc.)

DBGD(oracle)

Results from simulations on the Yahoo Webscope dataset.31

Page 49: Optimizing Ranking Models in an Online Setting

Comparison: Complete

0 2 · 105 4 · 105 6 · 105 8 · 105 106

iterations0.35

0.40

0.45

0.50

0.55

0.60ND

CGPDGD(non-casc.)

PDGD(perfect)

PDGD(casc.)

DBGD(perfect)

DBGD(casc.)

DBGD(non-casc.)

DBGD(oracle)

Results from simulations on the Istella dataset.32

Page 50: Optimizing Ranking Models in an Online Setting

Conclusion

Page 51: Optimizing Ranking Models in an Online Setting

Conclusion

In this reproducibility paper we have:

• shown that an existing proof for regret bounds is unsound.

• Reproduced a comparison between Pairwise Differentiable Gradient Descent

and Dueling Bandit Gradient Descent, and generalized their conclusions from

ideal circumstances to extremely difficult circumstances.

• Shown that under all experimental conditions we could simulate, Pairwise

Differentiable Gradient Descent outperforms previous methods by large margins.

Please continue our work: https://github.com/HarrieO/OnlineLearningToRank

33

Page 52: Optimizing Ranking Models in an Online Setting

Conclusion

In this reproducibility paper we have:

• shown that an existing proof for regret bounds is unsound.

• Reproduced a comparison between Pairwise Differentiable Gradient Descent

and Dueling Bandit Gradient Descent, and generalized their conclusions from

ideal circumstances to extremely difficult circumstances.

• Shown that under all experimental conditions we could simulate, Pairwise

Differentiable Gradient Descent outperforms previous methods by large margins.

Please continue our work: https://github.com/HarrieO/OnlineLearningToRank

33

Page 53: Optimizing Ranking Models in an Online Setting

Conclusion

In this reproducibility paper we have:

• shown that an existing proof for regret bounds is unsound.

• Reproduced a comparison between Pairwise Differentiable Gradient Descent

and Dueling Bandit Gradient Descent,

and generalized their conclusions from

ideal circumstances to extremely difficult circumstances.

• Shown that under all experimental conditions we could simulate, Pairwise

Differentiable Gradient Descent outperforms previous methods by large margins.

Please continue our work: https://github.com/HarrieO/OnlineLearningToRank

33

Page 54: Optimizing Ranking Models in an Online Setting

Conclusion

In this reproducibility paper we have:

• shown that an existing proof for regret bounds is unsound.

• Reproduced a comparison between Pairwise Differentiable Gradient Descent

and Dueling Bandit Gradient Descent, and generalized their conclusions from

ideal circumstances to extremely difficult circumstances.

• Shown that under all experimental conditions we could simulate, Pairwise

Differentiable Gradient Descent outperforms previous methods by large margins.

Please continue our work: https://github.com/HarrieO/OnlineLearningToRank

33

Page 55: Optimizing Ranking Models in an Online Setting

Conclusion

In this reproducibility paper we have:

• shown that an existing proof for regret bounds is unsound.

• Reproduced a comparison between Pairwise Differentiable Gradient Descent

and Dueling Bandit Gradient Descent, and generalized their conclusions from

ideal circumstances to extremely difficult circumstances.

• Shown that under all experimental conditions we could simulate, Pairwise

Differentiable Gradient Descent outperforms previous methods by large margins.

Please continue our work: https://github.com/HarrieO/OnlineLearningToRank

33

Page 56: Optimizing Ranking Models in an Online Setting

Conclusion

In this reproducibility paper we have:

• shown that an existing proof for regret bounds is unsound.

• Reproduced a comparison between Pairwise Differentiable Gradient Descent

and Dueling Bandit Gradient Descent, and generalized their conclusions from

ideal circumstances to extremely difficult circumstances.

• Shown that under all experimental conditions we could simulate, Pairwise

Differentiable Gradient Descent outperforms previous methods by large margins.

Please continue our work: https://github.com/HarrieO/OnlineLearningToRank

33

Page 57: Optimizing Ranking Models in an Online Setting

References i

Q. Ai, K. Bi, C. Luo, J. Guo, and W. B. Croft. Unbiased learning to rank with unbiased propensity

estimation. arXiv preprint arXiv:1804.05938, 2018.

O. Chapelle and Y. Chang. Yahoo! Learning to Rank Challenge Overview. Journal of Machine

Learning Research, 14:1–24, 2011.

K. Hofmann, A. Schuth, S. Whiteson, and M. de Rijke. Reusing historical interaction data for faster

online learning to rank for ir. In Proceedings of the sixth ACM international conference on Web

search and data mining, pages 183–192. ACM, 2013.

T. Joachims. Optimizing search engines using clickthrough data. In Proceedings of the eighth ACM

SIGKDD international conference on Knowledge discovery and data mining, pages 133–142. ACM,

2002.

T. Joachims, A. Swaminathan, and T. Schnabel. Unbiased learning-to-rank with biased feedback. In

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, pages

781–789. ACM, 2017.

34

Page 58: Optimizing Ranking Models in an Online Setting

References ii

D. Lefortier, P. Serdyukov, and M. de Rijke. Online exploration for detecting shifts in fresh intent. In

CIKM 2014: 23rd ACM Conference on Information and Knowledge Management. ACM, November

2014.

H. Oosterhuis and M. de Rijke. Sensitive and scalable online evaluation with theoretical guarantees. In

Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pages

77–86. ACM, 2017.

H. Oosterhuis and M. de Rijke. Differentiable unbiased online learning to rank. In Proceedings of the

27th ACM International Conference on Information and Knowledge Management, pages 1293–1302.

ACM, 2018.

H. Oosterhuis and M. de Rijke. Optimizing ranking models in the online setting. In ECIR. ACM, 2019.

T. Qin and T.-Y. Liu. Introducing letor 4.0 datasets. arXiv preprint arXiv:1306.2597, 2013.

M. Sanderson. Test collection based evaluation of information retrieval systems. Foundations and

Trends in Information Retrieval, 4(4):247–375, 2010.

35

Page 59: Optimizing Ranking Models in an Online Setting

References iii

A. Schuth, H. Oosterhuis, S. Whiteson, and M. de Rijke. Multileave gradient descent for fast online

learning to rank. In Proceedings of the Ninth ACM International Conference on Web Search and

Data Mining, pages 457–466. ACM, 2016.

H. Wang, R. Langley, S. Kim, E. McCord-Snook, and H. Wang. Efficient exploration of gradient space

for online learning to rank. arXiv preprint arXiv:1805.07317, 2018a.

X. Wang, M. Bendersky, D. Metzler, and M. Najork. Learning to rank with selection bias in personal

search. In Proceedings of the 39th International ACM SIGIR conference on Research and

Development in Information Retrieval, pages 115–124. ACM, 2016.

X. Wang, N. Golbandi, M. Bendersky, D. Metzler, and M. Najork. Position bias estimation for

unbiased learning to rank in personal search. In Proceedings of the Eleventh ACM International

Conference on Web Search and Data Mining, pages 610–618. ACM, 2018b.

Y. Yue and T. Joachims. Interactively optimizing information retrieval systems as a dueling bandits

problem. In Proceedings of the 26th Annual International Conference on Machine Learning, pages

1201–1208. ACM, 2009.

36

Page 60: Optimizing Ranking Models in an Online Setting

References iv

T. Zhao and I. King. Constructing reliable gradient exploration for online learning to rank. In

Proceedings of the 25th ACM International on Conference on Information and Knowledge

Management, pages 1643–1652. ACM, 2016.

37

Page 61: Optimizing Ranking Models in an Online Setting

Acknowledgments

All content represents the opinion of the author(s), which is not necessarily shared or endorsed by their

employers and/or sponsors.

38