1 overview of information retrieval and our solutions qiang yang department of computer science and...

Overview of Information Retrieval and our Solutions

Qiang Yang

Department of Computer Science and EngineeringThe Hong Kong University of Science and Technology

Hong Kong

Why Need Information Retrieval (IR)?

More and more online information in general (Information Overload)

Many tasks rely on effective management and exploitation of information

Textual information plays an important role in our lives

Effective text management directly improves productivity

What is IR?

Narrow-sense: IR= Search engine technologies

(Google/Yahoo!/Live Search) IR= Text matching/classification

Broad-sense: IR = Text information management:

How to find useful information? (info. retrieval) (e.g., Yahoo!)

How to organize information? (text classification) (e.g., automatically assign email to different folders)

How to discover knowledge from text? (text mining) (e.g., discover correlation of events)

Difficulties Huge Amount of Online Data

Yahoo! has nearly 20 billion pages in its index (as collected at the beginning of 2005)

Different types of data Web-pages, emails, blogs, chatting-room

messages; Ambiguous Queries

Short: 2-4 words Ambiguous: apple; bank…

Our Solutions Query Classification

Champion of KDDCUP’05; TOIS (Vol. 24); SIGIR’06; KDD Exploration (Vol. 7)

Query Expansion/Suggestion Submissions to: SIGIR’07; AAAI’07; KDD’07

Entity Resolution Submission to SIGIR’07

Web page Classification/Clustering SIGIR’04; CIKM’04; ICDM’04; ICDE’06; WWW’06; IPM (2007),

DMKD (Vol. 12) Document Summarization

SIGIR’05; IJCAI’07 Analysis of Blogs, Emails, Chatting-room

messages SIGIR’06; ICDM’06 (2); IJCAI’07

Outline

Query Classification (QC) Introduction Solution 1: Query/category

enrichment; Solution 2: Bridging classifiers;

Entity Resolution Summary of Other works

Query Classification

Introduction Web-Query is difficult to manage:

Short; Ambiguous; Evolving

Query Classification (QC) can help to understand query better

Vertical Search Re-rank search results Online Advertisements

Difficulties of QC (Different from text classification) How to represent queries Target taxonomy is dynamic, e.g. online ads

taxonomy Training data is difficult to collect

Problem Definition

Inspired by the KDDCUP’05 competition Classify a query into a ranked list of

categories Queries are collected from real search

engines Target categories are organized in a tree

with each node being a category

Related Work

Document Classification Feature selection [Yang et al. 1997] Feature generation [Cai et al. 2003] Classification algorithms

Naïve Bayes [Andrew and Nigam 1998] KNN [Yang 1999] SVM [Joachims 1999] ……

An overall survey in [Sebastiani 2002]

Related work Query Classification/Clustering

Classify the Web queries by geographical locality [Gravano 2003];

Classify queries according to their functional types [Kang 2003];

Beitzel et al. studied the topical classification as we do. However they have manually classified data [Beitzel 2005];

Beeferman and Wen worked on query clustering using clickthrough data respectively [Beeferman 2000; Wen 2001];

1 overview of information retrieval and our solutions qiang yang department of computer science and...

text information management

online information

query clustering

text classificationhow

text mining

useful information

liveseffective text

topical classification

Documents

1 text mining and information retrieval qiang yang hkust...

€¦ · cs-590i information retrieval retrieval models luo...

1 towards heterogeneous transfer learning qiang yang hong...

hua qiang bei map

information retrieval computer science tripos part ii ·...

1 evaluating high accuracy retrieval techniques chirag...

chen shaojie*, ren jianxi, li qiang, yang guang and yan...

university of michigan electrical engineering and computer...

introduction to information...

11 selected applications of transfer learning 杨强，...

information retrieval -...

retrieval 2/2 bdk12-6 information retrieval william hersh,...

notes information retrieval tools: catalogues, indexes ......

private information retrieval yuval ishai computer science...

1 image retrieval hao jiang computer science department 2009

doc retrieval -...

the art and science of image retrieval

optimization of a finite state space for information...

department of computer science & information engineering...

media retrieval information retrieval image retrieval video...