![Page 1: A Support Vector Method for Optimizing Average Precision SIGIR 2007 Yisong Yue Cornell University In Collaboration With: Thomas Finley, Filip Radlinski,](https://reader036.vdocuments.net/reader036/viewer/2022062618/5513d64a5503466f748b4f47/html5/thumbnails/1.jpg)
A Support Vector Method for Optimizing Average Precision
SIGIR 2007
Yisong YueCornell University
In Collaboration With:
Thomas Finley, Filip Radlinski, Thorsten Joachims
(Cornell University)
![Page 2: A Support Vector Method for Optimizing Average Precision SIGIR 2007 Yisong Yue Cornell University In Collaboration With: Thomas Finley, Filip Radlinski,](https://reader036.vdocuments.net/reader036/viewer/2022062618/5513d64a5503466f748b4f47/html5/thumbnails/2.jpg)
Motivation
• Learn to Rank Documents
• Optimize for IR performance measures
• Mean Average Precision
• Leverage Structural SVMs [Tsochantaridis et al. 2005]
![Page 3: A Support Vector Method for Optimizing Average Precision SIGIR 2007 Yisong Yue Cornell University In Collaboration With: Thomas Finley, Filip Radlinski,](https://reader036.vdocuments.net/reader036/viewer/2022062618/5513d64a5503466f748b4f47/html5/thumbnails/3.jpg)
MAP vs Accuracy
• Average precision is the average of the precision scores at the rank locations of each relevant document.
• Ex: has average precision
• Mean Average Precision (MAP) is the mean of the Average
Precision scores for a group of queries.
• A machine learning algorithm optimizing for Accuracy might learn a very different model than optimizing for MAP.
• Ex: has average precision of about 0.64, but has a max accuracy of 0.8 vs 0.6 in above ranking.
76.05
3
3
2
1
1
3
1
![Page 4: A Support Vector Method for Optimizing Average Precision SIGIR 2007 Yisong Yue Cornell University In Collaboration With: Thomas Finley, Filip Radlinski,](https://reader036.vdocuments.net/reader036/viewer/2022062618/5513d64a5503466f748b4f47/html5/thumbnails/4.jpg)
Recent Related Work
Greedy & Local Search• [Metzler & Croft 2005] – optimized for MAP, used gradient descent,
expensive for large number of features.
• [Caruana et al. 2004] – iteratively built an ensemble to greedily improve arbitrary performance measures.
Surrogate Performance Measures• [Burges et al. 2005] – used neural nets optimizing for cross entropy.
• [Cao et al. 2006] – used SVMs optimizing for modified ROC-Area.
Relaxations• [Xu & Li 2007] – used Boosting with exponential loss relaxation
![Page 5: A Support Vector Method for Optimizing Average Precision SIGIR 2007 Yisong Yue Cornell University In Collaboration With: Thomas Finley, Filip Radlinski,](https://reader036.vdocuments.net/reader036/viewer/2022062618/5513d64a5503466f748b4f47/html5/thumbnails/5.jpg)
Conventional SVMs
• Input examples denoted by x (high dimensional point)• Output targets denoted by y (either +1 or -1)
• SVMs learns a hyperplane w, predictions are sign(wTx)• Training involves finding w which minimizes
• subject to
• The sum of slacks upper bounds the accuracy loss
N
iiN
Cw
1
2
2
1
iiT xwi 1)(y : i
i
i
![Page 6: A Support Vector Method for Optimizing Average Precision SIGIR 2007 Yisong Yue Cornell University In Collaboration With: Thomas Finley, Filip Radlinski,](https://reader036.vdocuments.net/reader036/viewer/2022062618/5513d64a5503466f748b4f47/html5/thumbnails/6.jpg)
Conventional SVMs
• Input examples denoted by x (high dimensional point)• Output targets denoted by y (either +1 or -1)
• SVMs learns a hyperplane w, predictions are sign(wTx)• Training involves finding w which minimizes
• subject to
• The sum of slacks upper bounds the accuracy loss
N
iiN
Cw
1
2
2
1
iiT xwi 1)(y : i
i
i
![Page 7: A Support Vector Method for Optimizing Average Precision SIGIR 2007 Yisong Yue Cornell University In Collaboration With: Thomas Finley, Filip Radlinski,](https://reader036.vdocuments.net/reader036/viewer/2022062618/5513d64a5503466f748b4f47/html5/thumbnails/7.jpg)
Adapting to Average Precision
• Let x denote the set of documents/query examples for a query• Let y denote a (weak) ranking (each yij 2 {-1,0,+1})
• Same objective function:
• Constraints are defined for each incorrect labeling y’ over the set of documents x.
• Joint discriminant score for the correct labeling at least as large as incorrect labeling plus the performance loss.
i
iN
Cw 2
2
1
)',(),'(),( :' yyxywxywyy TT
![Page 8: A Support Vector Method for Optimizing Average Precision SIGIR 2007 Yisong Yue Cornell University In Collaboration With: Thomas Finley, Filip Radlinski,](https://reader036.vdocuments.net/reader036/viewer/2022062618/5513d64a5503466f748b4f47/html5/thumbnails/8.jpg)
Adapting to Average Precision
• Maximize
subject to
where
and
• Sum of slacks upper bound MAP loss.
• After learning w, a prediction is made by sorting on wTxi
reli relj
jiij xxyxy: :!
)('),'(
i
iN
Cw 2
2
1
)',(),'(),( :' yyxywxywyy TT
)'(1)',( yAvgprecyy
![Page 9: A Support Vector Method for Optimizing Average Precision SIGIR 2007 Yisong Yue Cornell University In Collaboration With: Thomas Finley, Filip Radlinski,](https://reader036.vdocuments.net/reader036/viewer/2022062618/5513d64a5503466f748b4f47/html5/thumbnails/9.jpg)
Adapting to Average Precision
• Maximize
subject to
where
and
• Sum of slacks upper bound MAP loss.
• After learning w, a prediction is made by sorting on wTxi
reli relj
jiij xxyxy: :!
)('),'(
i
iN
Cw 2
2
1
)',(),'(),( :' yyxywxywyy TT
)'(1)',( yAvgprecyy
![Page 10: A Support Vector Method for Optimizing Average Precision SIGIR 2007 Yisong Yue Cornell University In Collaboration With: Thomas Finley, Filip Radlinski,](https://reader036.vdocuments.net/reader036/viewer/2022062618/5513d64a5503466f748b4f47/html5/thumbnails/10.jpg)
Too Many Constraints!
• For Average Precision, the true labeling is a ranking where the relevant documents are all ranked in the front, e.g.,
• An incorrect labeling would be any other ranking, e.g.,
• This ranking has Average Precision of about 0.8 with (y,y’) ¼ 0.2
• Exponential number of rankings, thus an exponential number of constraints!
![Page 11: A Support Vector Method for Optimizing Average Precision SIGIR 2007 Yisong Yue Cornell University In Collaboration With: Thomas Finley, Filip Radlinski,](https://reader036.vdocuments.net/reader036/viewer/2022062618/5513d64a5503466f748b4f47/html5/thumbnails/11.jpg)
Structural SVM Training• STEP 1: Solve the SVM objective function using only the current
working set of constraints.
• STEP 2: Using the model learned in STEP 1, find the most violated constraint from the exponential set of constraints.
• STEP 3: If the constraint returned in STEP 2 is more violated than the most violated constraint the working set by some small constant, add that constraint to the working set.
• Repeat STEP 1-3 until no additional constraints are added. Return the most recent model that was trained in STEP 1.
STEP 1-3 is guaranteed to loop for at most a polynomial number of iterations. [Tsochantaridis et al. 2005]
![Page 12: A Support Vector Method for Optimizing Average Precision SIGIR 2007 Yisong Yue Cornell University In Collaboration With: Thomas Finley, Filip Radlinski,](https://reader036.vdocuments.net/reader036/viewer/2022062618/5513d64a5503466f748b4f47/html5/thumbnails/12.jpg)
Illustrative Example
Original SVM Problem• Exponential constraints
• Most are dominated by a small set of “important” constraints
Structural SVM Approach• Repeatedly finds the next most
violated constraint…
• …until set of constraints is a good approximation.
![Page 13: A Support Vector Method for Optimizing Average Precision SIGIR 2007 Yisong Yue Cornell University In Collaboration With: Thomas Finley, Filip Radlinski,](https://reader036.vdocuments.net/reader036/viewer/2022062618/5513d64a5503466f748b4f47/html5/thumbnails/13.jpg)
Illustrative Example
Original SVM Problem• Exponential constraints
• Most are dominated by a small set of “important” constraints
Structural SVM Approach• Repeatedly finds the next most
violated constraint…
• …until set of constraints is a good approximation.
![Page 14: A Support Vector Method for Optimizing Average Precision SIGIR 2007 Yisong Yue Cornell University In Collaboration With: Thomas Finley, Filip Radlinski,](https://reader036.vdocuments.net/reader036/viewer/2022062618/5513d64a5503466f748b4f47/html5/thumbnails/14.jpg)
Illustrative Example
Original SVM Problem• Exponential constraints
• Most are dominated by a small set of “important” constraints
Structural SVM Approach• Repeatedly finds the next most
violated constraint…
• …until set of constraints is a good approximation.
![Page 15: A Support Vector Method for Optimizing Average Precision SIGIR 2007 Yisong Yue Cornell University In Collaboration With: Thomas Finley, Filip Radlinski,](https://reader036.vdocuments.net/reader036/viewer/2022062618/5513d64a5503466f748b4f47/html5/thumbnails/15.jpg)
Illustrative Example
Original SVM Problem• Exponential constraints
• Most are dominated by a small set of “important” constraints
Structural SVM Approach• Repeatedly finds the next most
violated constraint…
• …until set of constraints is a good approximation.
![Page 16: A Support Vector Method for Optimizing Average Precision SIGIR 2007 Yisong Yue Cornell University In Collaboration With: Thomas Finley, Filip Radlinski,](https://reader036.vdocuments.net/reader036/viewer/2022062618/5513d64a5503466f748b4f47/html5/thumbnails/16.jpg)
Finding Most Violated Constraint
• Structural SVM is an oracle framework.
• Requires subroutine to find the most violated constraint.– Dependent on formulation of loss function and joint feature
representation.
• Exponential number of constraints!
• Efficient algorithm in the case of optimizing MAP.
![Page 17: A Support Vector Method for Optimizing Average Precision SIGIR 2007 Yisong Yue Cornell University In Collaboration With: Thomas Finley, Filip Radlinski,](https://reader036.vdocuments.net/reader036/viewer/2022062618/5513d64a5503466f748b4f47/html5/thumbnails/17.jpg)
Finding Most Violated Constraint
Observation• MAP is invariant on the order of documents within a relevance
class– Swapping two relevant or non-relevant documents does not change MAP.
• Joint SVM score is optimized by sorting by document score, wTx
• Reduces to finding an interleaving
between two sorted lists of documents
reli relj
jT
iT
ij xwxwyyywyH: :!
)(')',();'(
![Page 18: A Support Vector Method for Optimizing Average Precision SIGIR 2007 Yisong Yue Cornell University In Collaboration With: Thomas Finley, Filip Radlinski,](https://reader036.vdocuments.net/reader036/viewer/2022062618/5513d64a5503466f748b4f47/html5/thumbnails/18.jpg)
Finding Most Violated Constraint
reli relj
jT
iT
ij xwxwyyywyH: :!
)(')',();'(
►
• Start with perfect ranking• Consider swapping adjacent
relevant/non-relevant documents
![Page 19: A Support Vector Method for Optimizing Average Precision SIGIR 2007 Yisong Yue Cornell University In Collaboration With: Thomas Finley, Filip Radlinski,](https://reader036.vdocuments.net/reader036/viewer/2022062618/5513d64a5503466f748b4f47/html5/thumbnails/19.jpg)
Finding Most Violated Constraint
reli relj
jT
iT
ij xwxwyyywyH: :!
)(')',();'(
►
• Start with perfect ranking• Consider swapping adjacent
relevant/non-relevant documents• Find the best feasible ranking of
the non-relevant document
![Page 20: A Support Vector Method for Optimizing Average Precision SIGIR 2007 Yisong Yue Cornell University In Collaboration With: Thomas Finley, Filip Radlinski,](https://reader036.vdocuments.net/reader036/viewer/2022062618/5513d64a5503466f748b4f47/html5/thumbnails/20.jpg)
Finding Most Violated Constraint
reli relj
jT
iT
ij xwxwyyywyH: :!
)(')',();'(
►
• Start with perfect ranking• Consider swapping adjacent
relevant/non-relevant documents• Find the best feasible ranking of the
non-relevant document• Repeat for next non-relevant
document
![Page 21: A Support Vector Method for Optimizing Average Precision SIGIR 2007 Yisong Yue Cornell University In Collaboration With: Thomas Finley, Filip Radlinski,](https://reader036.vdocuments.net/reader036/viewer/2022062618/5513d64a5503466f748b4f47/html5/thumbnails/21.jpg)
Finding Most Violated Constraint
reli relj
jT
iT
ij xwxwyyywyH: :!
)(')',();'(
►
• Start with perfect ranking• Consider swapping adjacent
relevant/non-relevant documents• Find the best feasible ranking of the
non-relevant document• Repeat for next non-relevant
document• Never want to swap past previous
non-relevant document
![Page 22: A Support Vector Method for Optimizing Average Precision SIGIR 2007 Yisong Yue Cornell University In Collaboration With: Thomas Finley, Filip Radlinski,](https://reader036.vdocuments.net/reader036/viewer/2022062618/5513d64a5503466f748b4f47/html5/thumbnails/22.jpg)
Finding Most Violated Constraint
reli relj
jT
iT
ij xwxwyyywyH: :!
)(')',();'(
►
• Start with perfect ranking• Consider swapping adjacent
relevant/non-relevant documents• Find the best feasible ranking of the
non-relevant document• Repeat for next non-relevant
document• Never want to swap past previous
non-relevant document• Repeat until all non-relevant
documents have been considered
![Page 23: A Support Vector Method for Optimizing Average Precision SIGIR 2007 Yisong Yue Cornell University In Collaboration With: Thomas Finley, Filip Radlinski,](https://reader036.vdocuments.net/reader036/viewer/2022062618/5513d64a5503466f748b4f47/html5/thumbnails/23.jpg)
Quick Recap
SVM Formulation• SVMs optimize a tradeoff between model complexity and MAP loss• Exponential number of constraints (one for each incorrect ranking)• Structural SVMs finds a small subset of important constraints• Requires sub-procedure to find most violated constraint
Find Most Violated Constraint• Loss function invariant to re-ordering of relevant documents• SVM score imposes an ordering of the relevant documents• Finding interleaving of two sorted lists• Loss function has certain monotonic properties• Efficient algorithm
![Page 24: A Support Vector Method for Optimizing Average Precision SIGIR 2007 Yisong Yue Cornell University In Collaboration With: Thomas Finley, Filip Radlinski,](https://reader036.vdocuments.net/reader036/viewer/2022062618/5513d64a5503466f748b4f47/html5/thumbnails/24.jpg)
Experiments
• Used TREC 9 & 10 Web Track corpus.
• Features of document/query pairs computed from outputs of existing retrieval functions.
(Indri Retrieval Functions & TREC Submissions)
• Goal is to learn a recombination of outputs which improves mean average precision.
![Page 25: A Support Vector Method for Optimizing Average Precision SIGIR 2007 Yisong Yue Cornell University In Collaboration With: Thomas Finley, Filip Radlinski,](https://reader036.vdocuments.net/reader036/viewer/2022062618/5513d64a5503466f748b4f47/html5/thumbnails/25.jpg)
Comparison with Best Base methods
0.1
0.12
0.14
0.16
0.18
0.2
0.22
0.24
0.26
0.28
0.3
TREC 9 Indri TREC 10 Indri TREC 9Submissions
TREC 10Submissions
TREC 9Submissions(without best)
TREC 10Submissions(without best)
Dataset
Mea
n A
vera
ge
Pre
cisi
on
SVM-MAP
Base 1
Base 2
Base 3
![Page 26: A Support Vector Method for Optimizing Average Precision SIGIR 2007 Yisong Yue Cornell University In Collaboration With: Thomas Finley, Filip Radlinski,](https://reader036.vdocuments.net/reader036/viewer/2022062618/5513d64a5503466f748b4f47/html5/thumbnails/26.jpg)
Comparison with other SVM methods
0.1
0.12
0.14
0.16
0.18
0.2
0.22
0.24
0.26
0.28
0.3
TREC 9 Indri TREC 10 Indri TREC 9Submissions
TREC 10Submissions
TREC 9Submissions(without best)
TREC 10Submissions(without best)
Dataset
Mea
n A
vera
ge
Pre
cisi
on
SVM-MAP
SVM-ROC
SVM-ACC
SVM-ACC2
SVM-ACC3
SVM-ACC4
![Page 27: A Support Vector Method for Optimizing Average Precision SIGIR 2007 Yisong Yue Cornell University In Collaboration With: Thomas Finley, Filip Radlinski,](https://reader036.vdocuments.net/reader036/viewer/2022062618/5513d64a5503466f748b4f47/html5/thumbnails/27.jpg)
Moving Forward
• Approach also works (in theory) for other measures.
• Some promising results when optimizing for NDCG (with only 1 level of relevance).
• Currently working on optimizing for NDCG with multiple levels of relevance.
• Preliminary MRR results not as promising.
![Page 28: A Support Vector Method for Optimizing Average Precision SIGIR 2007 Yisong Yue Cornell University In Collaboration With: Thomas Finley, Filip Radlinski,](https://reader036.vdocuments.net/reader036/viewer/2022062618/5513d64a5503466f748b4f47/html5/thumbnails/28.jpg)
Conclusions
• Principled approach to optimizing average precision.
(avoids difficult to control heuristics)
• Performs at least as well as alternative SVM methods.
• Can be generalized to a large class of rank-based performance measures.
• Software available at http://svmrank.yisongyue.com