learning to rank: new techniques and applications
Post on 24-Feb-2016
67 Views
Preview:
DESCRIPTION
TRANSCRIPT
Learning to Rank: New Techniques and Applications
Martin SzummerMicrosoft Research
Cambridge, UK
Martin Szummer2 Microsoft Research
Why learning to rank?
• Current rankers use many features, in complex combinations
• Applications– Web search ranking, enterprise search – Image search– Ad selection– Merging multiple results lists
• The good: uses training data to find combinations of features that optimize IR metrics
• The bad: requires judged training data. Expensive, subjective, not provided by end-users, out-of-date
Martin Szummer3 Microsoft Research
This talk• Learning to rank with IR metrics A single, simple yet competition-winning, recipe. Works for NDCG, MAP, Precision with linear or non-linear ranking functions (neural nets, boosted trees etc)
• Semi-supervised rankingA new technique. Reduce the amount of judged training data required.
• Learning to mergeApplication: merging results lists from multiple query reformulations
Actually – I apply the same recipe in three
different settings!
Martin Szummer4 Microsoft Research
Ranking Background
• Classification: determine the class of an item i (operates on individual items)
• Ranking: determine the preference of item i versus j (operates on pairs of items)
• Ranking function:
Example: Linear function Ranking function induces a preference: when
score function query-docfeatures
parameters
Martin Szummer5 Microsoft Research
From Ranking Function to the Ranking
• Applying the ranking function to define a ranking Sort {}
• Above: had a deterministic model of preference• Henceforth: a probabilistic model
translates score differences into a probability of preference Bradley-Terry/Mallows
Martin Szummer6 Microsoft Research
Learning to Rank
• Learning to rank Sort {}
• Maximize likelihood of the preference pairs given in training data
indicator when in train
e.g. RankNet model [Burges et al 2005]
given givendetermine w
preferencepairs
Martin Szummer7 Microsoft Research
Learning to Rank for IR metrics• IR metrics such as NDCG, MAP or Precision depend on:
– sorted order of items– ranks of items: weight the top of the ranking more
Recipe1) Express the metric as a sum of pairwise swap deltas2) Smooth it by multiplying by a Bradley-Terry term3) Optimize parameters by gradient descent over a judged
training set
LambdaRank & LambdaMART [Burges et al] are instances of this recipe. The latter won the Yahoo! Learning to rank challenge (2010).
Martin Szummer8 Microsoft Research
Example: Apply recipe to NDCG metric
Unpublished material. Email me if interested.
Martin Szummer9 Microsoft Research
Gradients - intuition
• Gradients act as forces on doc pairs
x Lr12345
𝑑𝐶𝑑𝑠𝑖 𝑗
𝑑𝐶𝑑𝑠𝑖 𝑗
Martin Szummer10 Microsoft Research
Semi-supervised Ranking
prefer
[with Emine Yilmaz]
Train with judged AND unjudged query-document pairs
Martin Szummer11 Microsoft Research
Semi-supervised Ranking
• Applications– (Pseudo) Relevance feedback– Reduce the number of (expensive) human judgments– Use when judgments are hard to obtain
• Customers may not want to judge their collections– adaptation to a specific company in enterprise search– ranking for small markets, special interest domains,
• Approach– preference learning– end-to-end optimization of ranking metrics (NDCG, MAP)– multiple and completely unlabeled rank instances– scalability
Martin Szummer12 Microsoft Research
How to benefit from unlabeled data?
Unlabeled data gives information about the data distribution P(x). We must make assumptions about what the structure of the unlabeled data tells us about the ranking distribution P(R|x).
A common assumption: the cluster assumption Unlabeled data defines the extent of clusters, Labeled data determines the class/function value of each cluster
Martin Szummer13 Microsoft Research
Semi-supervised classification: similar documents Þ same class regression: similar documents Þ similar function value ranking: similar documents Þ similar preference i.e. neither is preferred to the other
• Differences from classification & regression:– Preferences provide weaker constraints than function values or classes
is a type of regularizer on the function we are learning.
Similarity can be defined based on content. Does not require judgments.
Martin Szummer14 Microsoft Research
Quantify Similarity
similar documents Þ similar preference i.e. neither is preferred to the other
Unpublished material. Email me if interested.
Martin Szummer15 Microsoft Research
Semi-supervised Gradients
x L 𝑑𝐶𝐿
𝑑 𝑠𝑖 𝑗𝑑𝐶𝑈
𝑑𝑠𝑖 𝑗𝑑𝐶𝐿
𝑑𝑠𝑖 𝑗+𝛽 𝑑𝐶
𝑈
𝑑 𝑠𝑖𝑗
Martin Szummer16 Microsoft Research
ExperimentsRelevance Feedback task: 1) user issues a query and labels a few of the resulting documents from a traditional ranker (BM25) 2) system trains query-specific ranker, and re-ranks
Data: TREC collection. 528,000 documents, 150 queries1000 total documents per query; 2-15 docs are labeled
Features: ranking features (q, d): 22 features from LETOR content features (d1, d2): TF-IDF dist between top 50 words
Neighbors in input space using either of the above Note: at test time, only ranking features are used; method allows using features of type (d1, d2) and (q, d1, d2) at training that other algos cannot use
Ranking function f(): neural network, 3 hidden unitsK=5 neighbors
Martin Szummer17 Microsoft Research
Relevance Feedback Task
2 3 5 10 150.1
0.2
0.3
0.4
0.5
0.6
Number of labeled documents
ND
CG
(10)
LambdaRank L&U ContLambdaRank L&ULambdaRank LTSVM L&U
RankBoost L&URankingSVM L
RankBoost L
Martin Szummer18 Microsoft Research
Novel Queries Task
90,000 training documents3500 preference pairs
40 million unlabeled pairs
Martin Szummer19 Microsoft Research
Novel Queries Task
102
1030.1
0.2
0.3
0.4
0.5
Number of labeled preference pairs
ND
CG
(10)
LambdaRank L&U ContLambdaRank L&ULambdaRank L
Upper Bound
Martin Szummer20 Microsoft Research
Learning to MergeTask: learn a ranker that merges results from other rankers Example application
users do not know the best way to express their web search querya single query may not be enough to reach all relevant documents
merge results
wp7
wp7 phonereformulatein parallel: microsoft wp7
user:Solution
Martin Szummer21 Microsoft Research
Merging Multiple Queries [with Sheldon, Shokouhi, Craswell]
• Traditional approach: alter before retrieval• Merging: alter after retrieval
– Prospecting: see results first, then decide– Flexibility: any is rewrite allowed, arbitrary
features– Upside potential: better than any individual list– Increased query load on engine: use cache to
mitigate it
Martin Szummer22 Microsoft Research
LambdaMerge: learn to merge
A weighted mixture of ranking function
Rewrite features: Rewrite-difficulty: ListMean, ListStd, Clarity Rewrite-drift: IsRewrite, RewriteRank, RewriteScore,Overlap@N
Scoring features: Dynamic rank score, BM25, Rank, IsTopN
rewrite feat
score feat score feat
jupiters mass mass of jupiter
Martin Szummer23 Microsoft Research
Martin Szummer24 Microsoft Research
Martin Szummer25 Microsoft ResearchReformulation – Original NDCG
Mer
ged
– O
rigin
al N
DCG
-Merge Results
Martin Szummer26 Microsoft Research
Summary
• Learning to Rank– An indispensable tool– Requires judgments: but semi-supervised learning can help
crowd-sourcing is also a possibility research frontier: implicit judgments from clicks
– Many applications beyond those shown• Merging: multiple local search engines, multiple language engines• Rank recommendations in collaborative filtering• Many thresholding tasks (filtering) can be posed as ranking• Rank ads for relevance • Elections
– Use it!
top related