query chains: learning to rank from implicit feedback

Query Chains: Learning to Rank from Implicit Feedback

Paper Authors: Filip Radlinski

Thorsten Joachims

Presented By: Steven Carr

The Problem• The results returned from

web searches can be cluttered with results that the user considers to be irrelevant

• Search engines don’t learn from your document selections or from revisions to your query

Page RankingNon-learning Methods

▫Link-based (Google PageRank)Learning Methods

▫Explicit user feedback Ask the user how relevant they found the result Very accurate data, but very time-consuming

▫Implicit user feedback Determine the relevance by looking at search

engine logs Unlimited data at a low cost, but requires

interpretation

The Solution•Automatically detect query chains•Use query chains to infer relevance of

results in each query and between results from all queries in the chain

•Use a ranking Support Vector Machine (SVM) to learn a retrieval function from the results.

•Osmot search engine based on this model

Query Chains•People often reword their queries to get

more useful results▫Spelling mistake▫Increased or decreased specificity▫New but related query

•Query chains are defined as a sequence of reformulated queries

Support Vector Machines• Learning method used for

classification• Separates two classes of

data points by generating a hyperplane that maximizes the vector distance between the two sets and the hyperplane

• Uses the hyperplane to assign new data points to one of the two classes

Identifying Query Chains• Manually labeled query chains from the Cornell

University library search engine for a period of five weeks

• Used data to train SVM’s with various parameters, giving an accuracy of 94.3% and a precision of 96.5%

• Non-learning strategy of assuming all queries from the same IP in a 30 minute period belong to the same chain gave an accuracy and precision of 91.6%

• The non-learning strategy was sufficiently accurate and less expensive so they used it instead

Inferring RelevanceDeveloped six strategies for generating

feedback from query chains▫ Click >q Skip Above: A clicked on document is more

relevant than any documents above it▫ Click First >q No-Click Second: Given the first two

document results, if the first was clicked, it is more relevant

▫ Strategies 3 and 4 are the same as the first two, but with respect to the previous query

▫ Click >q’ Skip Earlier Query: A clicked on document is more relevant than any that were skipped in any earlier query

▫ Click >q’ Top Two Earlier Query: If nothing was clicked in the last query, the clicked document is more relevant than the top two from an earlier query

Example

Learning Ranking Functions

Experiment• The Osmot search engine

was created as a wrapper, implementing logging, analysis and ranking

• Users presented with a combination of results from two different ranking functions

• Evaluate which ranking was better based on which documents were clicked

• Evaluation conducted over two months collecting around 2400 queries

Experiment Results•Users preferred results from the query

chain ranking function 53% of the time•Model trained with query chains

outperformed model trained without query chains with 99% confidence

Conclusion•Developed an algorithm to determine the

relevance of a document from log entries•Developed another algorithm to use

preference judgments to learn an improved ranking function▫Algorithm can learn to include documents

that weren’t included in the original search results

Critique•The learning method uses only log files

rather than constantly updating itself•Referred to other papers rather than

explain concepts needed to understand the paper

•Didn’t offer a comparison between the effectiveness of their learning algorithm compared to other learning algorithms

Questions?

query chains: learning to rank from implicit feedback

Documents

query chainsuse query

document results

relevance of results

clicked document

query chainsclick q

combination of results

document selections

earlier queryclick q