context-aware query classification

27
Context-Aware Query Context-Aware Query Classification Classification Huanhuan Cao 1 , Derek Hao Hu 2 , Dou Shen 3 , Daxin Jiang 4 , Jian-Tao Sun 4 , Enhong Chen 1 and Qiang Yang 2 1 University of Science and Technology of China, 2 Hong Kong University of Science and Technology, 3 Microsoft Corporation Microsoft Research Asia

Upload: gary-dean

Post on 02-Jan-2016

40 views

Category:

Documents


0 download

DESCRIPTION

Context-Aware Query Classification. Huanhuan Cao 1 , Derek Hao Hu 2 , Dou Shen 3 , Daxin Jiang 4 , Jian-Tao Sun 4 , Enhong Chen 1 and Qiang Yang 2 1 University of Science and Technology of China, 2 Hong Kong University of Science and Technology, 3 Microsoft Corporation - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Context-Aware Query Classification

Context-Aware Query Context-Aware Query Classification Classification

Huanhuan Cao1, Derek Hao Hu2, Dou Shen3, Daxin Jiang4 , Jian-Tao Sun4 , Enhong Chen1 and Qiang Yang2

1University of Science and Technology of China, 2Hong Kong University of Science and Technology,

3Microsoft Corporation4Microsoft Research Asia

Page 2: Context-Aware Query Classification

MotivationMotivation

• Understanding Web user's information need is one of the most important problems in Web search.

• Such information could generally help improving the quality of many Web search services such as:– Ranking– Online advertising – Query suggestion, etc.

Page 3: Context-Aware Query Classification

ChallengesChallenges

• The main challenges of query classification:– Lack of feature information– Ambiguity– Multiple intents

• The first problem has been studied widely:– Query expansion by top search results– Leverage a web directory

• However, the second and the third problems are far away from being closed.

Page 4: Context-Aware Query Classification

Why context is useful?Why context is useful?

• Context means the previous queries and clicked URLs in the same session given a query.

• It’s assumed that:– Context has semantic relation with the current query.– Context may help to label appropriate categories for

current query.

• It makes sense to exploit context for specifying the current query.

Page 5: Context-Aware Query Classification

ExampleExample

Page 6: Context-Aware Query Classification

ExampleExample

Page 7: Context-Aware Query Classification

ExampleExample

Page 8: Context-Aware Query Classification

Overview Overview

• Problem statement• Model query context by CRF• Features of CRF• Experiment• Conclusion and future work

Page 9: Context-Aware Query Classification

Problem Statement: ContextProblem Statement: Context

• In a user search session, suppose the user has raised a series of queries as q1q2…qT-1 and clicked some returned URLs U1U2…UT-1;

• If the user raises a query qT at time T, we call q1q2…qT-1 and U1U2…UT-1 as query context of qT

• And we call qt t (t ∈ [1, T - 1]) as contextual queries of qT .

Page 10: Context-Aware Query Classification

Query ContextQuery Context

Query Context of {Q_T}

Query Context of {Q_T}

Page 11: Context-Aware Query Classification

Problem Statement: QC with context Problem Statement: QC with context and Taxonomyand Taxonomy

• The objective of query classification (QC) with context is to classify a user query qT into a ranked list of K categories cT1, cT2, ..., cTK, among Nc categories {c1,c2,…,cNc}, given the context of qT .

• A target taxonomy Υ is a tree of categories where {c1,c2,…,cNc} are leaf nodes of this tree.

Page 12: Context-Aware Query Classification

Modeling Query Context by CRFModeling Query Context by CRF

where q represents q1q2…qt

Page 13: Context-Aware Query Classification

Why CRF?Why CRF?

• The two main advantages of CRF are: – 1) It can incorporate general feature functions to model

the relation between observations and unobserved states;

– 2) It doesn't need prior knowledge of the type of conditional distribution.

• Given 1), we can incorporate some external web knowledge.

• Given 2), we don’t need any assumptions of the type of p(c|q).

Page 14: Context-Aware Query Classification

Features of CRFFeatures of CRF

• When we use CRF to model query context, one of the most important part is to choose effective feature functions.

• We should consider:– Relevance between queries and category labels for

leveraging local information of queries;– Relevance between adjacent labels for leveraging

contextual information.

Page 15: Context-Aware Query Classification

Relevance between queries and Relevance between queries and category labelscategory labels

• Term occurrence– The terms of qt are obvious features for supporting ct

– Due to the limited size of training data, many useful terms indicating category information may be uncovered.

• General label confidence– Leverage an external web directory such as Google Directory;– where M means the number of returned results and Mct,qt means the number of returned results with

label ct after mapping.

Page 16: Context-Aware Query Classification

Relevance between queries and Relevance between queries and category labelscategory labels

• Click-aware label confidence– Combining the click-information with the knowledge of a external web

directory;–

– CConf(ct ,ut) can be calculated by multiple approaches.

– Here, we use VSM to calculate cosine similarity between term vectors of ct and ut

Page 17: Context-Aware Query Classification

Relevance between Adjacent LabelsRelevance between Adjacent Labels

• Direct relevance between adjacent labels– Occurrence of adjacent label pair <ct-1,ct>

– The weight implies how likely the two labels co-occur

• Taxonomy based relevance between adjacent labels– Limited by the sampling approach and size of the training data, some

reasonable adjacent label pairs may not occur proportionally or even not occur at all.

– Consider indirect relevance between adjacent labels by considering the taxonomy.

Page 18: Context-Aware Query Classification

ExperimentExperiment

• Data set:– 10,000 random selected sessions from one day’s search

log of a commercial search engine.

– Three labelers firstly label all possible categories with KDDCUP’05 taxonomy for each unique query of the training data.

Page 19: Context-Aware Query Classification

Examples of multiple category queriesExamples of multiple category queries

A large ratio of multiple category queries implies the difficulty of QC without context.

Page 20: Context-Aware Query Classification

Label SessionsLabel Sessions

• Then the three human labelers are asked to cross label each session of the data set with a sequence of level-2 category labels.

• For each query, a labeler gives a most appropriate category label by considering:– Query itself;– The query context;– Clicked URLs of the query.

Page 21: Context-Aware Query Classification

Tested ApproachesTested Approaches• Baselines:

– Non context-aware baseline: Bridging classifier(BC) proposed by Shen et al.

– Naïve context-aware baseline: Collaborating classifier(CC). Combine a test query and the previous query to classify with BC.

• CRFs:– CRF-B: CRF with basic features including term occurrence,

general label confidence and direct relevance between adjacent labels)

– CRF-B-C: CRF with basic features + click-aware label confidence)– CRF-B-C-T: CRF with basic features + click-aware label

confidence + taxonomy based relevance)

Page 22: Context-Aware Query Classification

Evaluation MetricsEvaluation Metrics

• Given a test session q1q2…qT, we let the qT be the test query and let queries q1q2…qT-1 and corresponding clicked URL sets U1U2…UT-1 be the query context.

• For qT ,we evaluate a tested approach by:– Precision(P): δ(cT ∈ CT,K)/K – Recall(R): δ(cT ∈ CT,K)– F1 score(F1 ): 2*P*R/(P+R)Where cT means the ground truth label and CT,K means a set of

the top K labels. δ(*) is a Boolean function of indicating whether * is true (=1) or false (=0).

Page 23: Context-Aware Query Classification

Overall resultsOverall results

1) The naïve context-aware baseline consistently outperforms the non context-aware baseline.

2) CRFs consistently outperform the two baselines.

3) CRF-B-C-T > CRF-B-C >CRF-B: click information and taxonomy based relevance are useful.

Page 24: Context-Aware Query Classification

Case studyCase study

Context about travel

Click a travel guide web page

Give the most appropriate label in the first position

Page 25: Context-Aware Query Classification

Efficiency of Our ApproachEfficiency of Our Approach

• Offline training:– Each iteration

takes about 300ms– Time cost of

training a CRF is acceptable

• Online cost:– Calculating

features• Label confidence

Page 26: Context-Aware Query Classification

Conclusion and Future workConclusion and Future work• In this paper, we propose a novel approach for query

classification by modeling query context via CRFs.

• Experiments on a real search log clearly show that our approach outperforms a non context-aware baseline and a naive context-aware baselines.

• Current approach cannot leverage the contextual information of the beginning queries of sessions, which make us carry on our following researches for leveraging more contextual information out of sessions.

Page 27: Context-Aware Query Classification

Thanks