context-sensitive query auto-completion

1

Context-Sensitive QueryAuto-CompletionAUTHORS:NAAMA KRAUS AND ZIV BAR-YOSSEF

DATE OF PUBLICATION:NOVEMBER 2010

SPEAKER:RISHU GUPTA

2

digital camera reviewsdigital camera buying guidedigital camera with wifidigital camera dealsdigital camera worlddigital picture framedigital copy

Motivating Example

I want to buy a good Digital

Camera

Current Result Desired Result

3

Most Challenging Auto-Completion Scenario

Challenge :Query Auto-Completion predicts the correct user’s query with only 12.8%

probability.

Goal :To predict the user’s intended query reliably when user has entered only

one character.

Advantages:◦ Makes search experience faster◦ Reduces load on servers in Instant Search

4

QAC Algorithms

User enters the prefix “x” of

Query “q”

Returns a List of “K”

Completions

“Hit” occurs if “c”=“q”

Need efficient data structure

for faster lookup

Completion “c” of Top K Completion

List

QAC Algorithm should also work

if “c” is semantically equal to “q”

Ordered By Quality Score

Hash Table or Trie

5

Context-Sensitive Auto-Completion

How to Compensate for the lack of information ??

Observation:

• User searches within some context.• User context reflects user’s intent.

Context examples• Recent queries• Recently visited pages• Recent Tweets• etc…..Our focus – “Recent queries”• Accessible by search engines• 49% of searches are preceded by a different

query in the same session • For simplicity, in this presentation we focus

on the most recent query

6

Recent Query Use Approaches

Cluster Similar Queries(Use of Techniques like HMMs)

Nearest Completion Algorithm(Assumption:Context relevant

to the query)

Generalize Most Popular Completion Algorithm

• None of these previous studies took the user input (prefix) into account in the prediction

• In 37% of the query pairs the former query has not occurred in the log before

Problem with this approach ??

How to tackle this problem ???

7

Nearest Completion:Measure of Similarity

Challenge: Choosing similarity

measure that is correlated and

universally applicable

Completions must be semantically related to the context query.

Recommendation Based Query Expansion

• Represent queries and contexts as high- dimensional term-weighted vectors and resort to cosine similarity.

• Idea :rich representation of a query is constructed not from its search results, but rather from its recommendation tree.

Recommendation Based Query

• Outputs list of recommendations which are reformulations of previous query.

• Problem occurs when none of the recommendation compatible with user query

How to Overcome this challenge ??

8

EvaluationEVALUATION METRIC

MRR-Mean Reciprocal Rank• A standard IR measure to evaluate a

retrieval of a specific object at a high rank

wMRR-Weighted MRR• Weight sample pairs according to

“prediction difficulty” (total # of candidate completions)

EVALUATION FRAMEWORK

Evaluation Set• A random sample of (context,

query) pairs from the AOL log

Prediction Task• Given context query and first

character of intended query predict intended query at as high rank as possible

9

Analysis

NearestCompletion

• Fails when the context is irrelevant (difficult to predict whether the context is relevant)

MostPopularCompletion

• Fails when the intended query is not highly popular (long tail)

Solution:HybridCompletion

• HybridCompletion: a combination of Most popular Completion and Nearest Completions• Its MRR is 31.5% higher

than that of MostPopularCompletion.

10

Most Popular VS Nearest Completion

Relevant Context:MRR of NearestCompletion (with depth-3 traversal) is higher in 48% than that of MostPopular-Completion.

NearestCompletion becomesdestructive, so its MRR is 19% lower than that of MostPopularCompletion.

11

How Hybrid Completion Works??

Produce Lists

• Produce top k completions of NearestCompletion• Produce top k completions of MostPopularCompletion

Standardi

ze• Two lists differ in units and scale

Hybrid

Score is

Convex

Combination

• hybscore(q) = α · Zsimscore(q) + (1 − α) · Zpopscore(q)• 0≤ α ≤1 is a tunable parameter

• Prior probability that context is relevant

MostPopular, Nearest, and Hybrid (2)

HybridCompletion is shown to be at least as good as NearestCompletion when the context is relevant and almost as good as MostPopularCompletion when thecontext is irrelevant.

13

Examples

14

Conclusion Query Auto Completion

HybridCompletion Algorithm

Nearest Completion Algorithm

MostPopularCompletion Algorithm

Context Sensitive-Query Auto Completion

Based on Popular Queries(AOL Query Log)

Convex Combination of NearestCompletion and

MostPopular

• Relevent Context:Based on Users Recent Queries

• Recommendation Based Algorithm: Rich Query Representatin

15

Future

• NearestCompletition: More effective session segmentation technique

• Predicting the first query in a session still remains an open problem Use of Other Context Resources like Recently Visited Web Pages or Search History

• Measure of Quality Evaluation should be more relaxed

• Rich query representation may be further fine tuned.

context-sensitive query auto-completion

Documents

different query

query pairs

user context

correct users query

context relevant

relevant context

mostpopularrelevent

context resources