learning to rank: new techniques and applications

Learning to Rank: New Techniques and Applications

Martin SzummerMicrosoft Research

Cambridge, UK

Martin Szummer2 Microsoft Research

Why learning to rank?

• Current rankers use many features, in complex combinations

• Applications– Web search ranking, enterprise search – Image search– Ad selection– Merging multiple results lists

• The good: uses training data to find combinations of features that optimize IR metrics

• The bad: requires judged training data. Expensive, subjective, not provided by end-users, out-of-date

This talk• Learning to rank with IR metrics A single, simple yet competition-winning, recipe. Works for NDCG, MAP, Precision with linear or non-linear ranking functions (neural nets, boosted trees etc)

• Semi-supervised rankingA new technique. Reduce the amount of judged training data required.

• Learning to mergeApplication: merging results lists from multiple query reformulations

Actually – I apply the same recipe in three

different settings!

Ranking Background

• Classification: determine the class of an item i (operates on individual items)

• Ranking: determine the preference of item i versus j (operates on pairs of items)

• Ranking function:

Example: Linear function Ranking function induces a preference: when

score function query-docfeatures

parameters

From Ranking Function to the Ranking

• Applying the ranking function to define a ranking Sort {}

• Above: had a deterministic model of preference• Henceforth: a probabilistic model

translates score differences into a probability of preference Bradley-Terry/Mallows

Learning to Rank

• Learning to rank Sort {}

• Maximize likelihood of the preference pairs given in training data

indicator when in train

e.g. RankNet model [Burges et al 2005]

given givendetermine w

preferencepairs

Learning to Rank for IR metrics• IR metrics such as NDCG, MAP or Precision depend on:

– sorted order of items– ranks of items: weight the top of the ranking more

Recipe1) Express the metric as a sum of pairwise swap deltas2) Smooth it by multiplying by a Bradley-Terry term3) Optimize parameters by gradient descent over a judged

training set

LambdaRank & LambdaMART [Burges et al] are instances of this recipe. The latter won the Yahoo! Learning to rank challenge (2010).

Example: Apply recipe to NDCG metric

Unpublished material. Email me if interested.

Gradients - intuition

• Gradients act as forces on doc pairs

x Lr12345

𝑑𝐶𝑑𝑠𝑖 𝑗

Semi-supervised Ranking

prefer

[with Emine Yilmaz]

Train with judged AND unjudged query-document pairs

Semi-supervised Ranking

• Applications– (Pseudo) Relevance feedback– Reduce the number of (expensive) human judgments– Use when judgments are hard to obtain

• Customers may not want to judge their collections– adaptation to a specific company in enterprise search– ranking for small markets, special interest domains,

• Approach– preference learning– end-to-end optimization of ranking metrics (NDCG, MAP)– multiple and completely unlabeled rank instances– scalability

How to benefit from unlabeled data?

Unlabeled data gives information about the data distribution P(x). We must make assumptions about what the structure of the unlabeled data tells us about the ranking distribution P(R|x).

A common assumption: the cluster assumption Unlabeled data defines the extent of clusters, Labeled data determines the class/function value of each cluster

Semi-supervised classification: similar documents Þ same class regression: similar documents Þ similar function value ranking: similar documents Þ similar preference i.e. neither is preferred to the other

• Differences from classification & regression:– Preferences provide weaker constraints than function values or classes

is a type of regularizer on the function we are learning.

Similarity can be defined based on content. Does not require judgments.

Quantify Similarity

similar documents Þ similar preference i.e. neither is preferred to the other

Unpublished material. Email me if interested.

Semi-supervised Gradients

x L 𝑑𝐶𝐿

𝑑 𝑠𝑖 𝑗𝑑𝐶𝑈

𝑑𝑠𝑖 𝑗𝑑𝐶𝐿

𝑑𝑠𝑖 𝑗+𝛽 𝑑𝐶

𝑑 𝑠𝑖𝑗

ExperimentsRelevance Feedback task: 1) user issues a query and labels a few of the resulting documents from a traditional ranker (BM25) 2) system trains query-specific ranker, and re-ranks

Data: TREC collection. 528,000 documents, 150 queries1000 total documents per query; 2-15 docs are labeled

Features: ranking features (q, d): 22 features from LETOR content features (d1, d2): TF-IDF dist between top 50 words

Neighbors in input space using either of the above Note: at test time, only ranking features are used; method allows using features of type (d1, d2) and (q, d1, d2) at training that other algos cannot use

Ranking function f(): neural network, 3 hidden unitsK=5 neighbors

Relevance Feedback Task

2 3 5 10 150.1

Number of labeled documents

LambdaRank L&U ContLambdaRank L&ULambdaRank LTSVM L&U

RankBoost L&URankingSVM L

RankBoost L

Novel Queries Task

90,000 training documents3500 preference pairs

40 million unlabeled pairs

Novel Queries Task

1030.1

Number of labeled preference pairs

LambdaRank L&U ContLambdaRank L&ULambdaRank L

Upper Bound

Learning to MergeTask: learn a ranker that merges results from other rankers Example application

users do not know the best way to express their web search querya single query may not be enough to reach all relevant documents

merge results

wp7 phonereformulatein parallel: microsoft wp7

user:Solution

Merging Multiple Queries [with Sheldon, Shokouhi, Craswell]

• Traditional approach: alter before retrieval• Merging: alter after retrieval

– Prospecting: see results first, then decide– Flexibility: any is rewrite allowed, arbitrary

features– Upside potential: better than any individual list– Increased query load on engine: use cache to

mitigate it

LambdaMerge: learn to merge

A weighted mixture of ranking function

Rewrite features: Rewrite-difficulty: ListMean, ListStd, Clarity Rewrite-drift: IsRewrite, RewriteRank, RewriteScore,Overlap@N

Scoring features: Dynamic rank score, BM25, Rank, IsTopN

rewrite feat

score feat score feat

jupiters mass mass of jupiter

Martin Szummer25 Microsoft ResearchReformulation – Original NDCG

-Merge Results

Summary

• Learning to Rank– An indispensable tool– Requires judgments: but semi-supervised learning can help

crowd-sourcing is also a possibility research frontier: implicit judgments from clicks

– Many applications beyond those shown• Merging: multiple local search engines, multiple language engines• Rank recommendations in collaborative filtering• Many thresholding tasks (filtering) can be posed as ranking• Rank ads for relevance • Elections

– Use it!

learning to rank: new techniques and applications

judged training data

ranking morerecipeexpress

data distribution px

ranking distribution

yahoo learning

ir metricsir metrics

classification regression

function values

Documents

low-rank and adaptive sparse signal (lassi) models for...

ventricular mechanics techniques and applications

lionfish dissection: techniques and applications

eye tracking techniques and applications

organisational development techniques & applications

positioning techniques in surgical applications

quantitative techniques for managerial applications

rank and bias in families of curves via ... - williams...

advanced computing techniques & applications

reduced-rank techniques for array signal processing and

fast matrix rank algorithms and applications...

low-rank tensor techniques for high-dimensional …low-rank...

on page seo techniques to rank on first page of google

industrial engineering techniques and applications

cryptographic applications of codes in rank metric

constrained optimization with low-rank tensors … ·...

learning to rank: new techniques and applications martin...

10 best seo techniques to rank a website 2021

reduced-rank techniques for sensor array signal processing...

vitrotechniques principales techniques et applications