exploiting ranking factorization machines for microblog retrieval

20
1 Exploiting Ranking Factorization Machines for Microblog Retrieval Exploiting Ranking Factorization Machines for Microblog Retrieval 北京大学计算机科学技术研究所 Institute of Computer Science & Technology Peking University Runwei Qiang Feng Liang Jianwu Yang Institute of Computer Science and Technology Peking University CIKM 2013

Upload: runwei-qiang

Post on 15-Jul-2015

206 views

Category:

Technology


12 download

TRANSCRIPT

Page 1: Exploiting Ranking Factorization Machines for Microblog Retrieval

1

Exploiting Ranking Factorization

Machines for Microblog Retrieval

Exploiting Ranking Factorization Machines for Microblog Retrieval

北京大学计算机科学技术研究所Institute of Computer Science & Technology Peking University

Runwei Qiang Feng Liang Jianwu Yang

Institute of Computer Science and Technology

Peking University

CIKM 2013

Page 2: Exploiting Ranking Factorization Machines for Microblog Retrieval

Problem Definition

Exploiting Ranking Factorization Machines for Microblog Retrieval2

Tweet Collection

(Q1 , t1)

(Q2 , t2)

(Qn , tn)

Q1 Q2 … Qn

timestamp

Not Available !!

Q1 Q2 … Qn

relevance

Real-time Search

At time t, find tweets

about topic X.

—— TREC’2011

ranking

Page 3: Exploiting Ranking Factorization Machines for Microblog Retrieval

Motivations

Exploiting Ranking Factorization Machines for Microblog Retrieval3

IR for microblog is a non-trivial problem

Length of document is very short

severe vocabulary-mismatch problem, how to apply query

expansion technique?

Abundance of shortened URLs

offer ways to expand document, but how to make use of it?

Large quantities of pointless babble

How to use the tweet quality to filter non-informative message?

Page 4: Exploiting Ranking Factorization Machines for Microblog Retrieval

Motivations

Exploiting Ranking Factorization Machines for Microblog Retrieval4

Learning to rank methods can make full use of different

models or factors in microblog retrieval

different factors => different features

Many features have been proved useful

Semantic features between query and document

Tweet quality features, i.e. link, retweet, and mention

count/binary

Page 5: Exploiting Ranking Factorization Machines for Microblog Retrieval

Limitations

Exploiting Ranking Factorization Machines for Microblog Retrieval5

Features are considered independent

Some features are closely related to each other.

RT and @ symbols occur in the same tweet frequently.

Feature utilization

Link feature: binary => semantic information

Small plane crashes at big airport; no one notices- CNN.com

Page 6: Exploiting Ranking Factorization Machines for Microblog Retrieval

Proposal

Exploiting Ranking Factorization Machines for Microblog Retrieval6

Employ an Ranking FM Framework

Adopts FM as the ranking function to model interactions

between features

Utilize several effective features which are neglected in

existing work

Optimize Ranking FM by two optimization methods

Stochastic Gradient Descent

Adaptive Regularization

Page 7: Exploiting Ranking Factorization Machines for Microblog Retrieval

Outline

Exploiting Ranking Factorization Machines for Microblog Retrieval7

Ranking FM for Microblog Retrieval

Ranking FM Framework

Optimization Methods

Feature Description

Experiments

Summary

Page 8: Exploiting Ranking Factorization Machines for Microblog Retrieval

Ranking FM Framework

Exploiting Ranking Factorization Machines for Microblog Retrieval8

Pairwise approach

1

, ,1

p q

p q

q p

y yx x z

y y

, , , p p q qx y x y

( ) ( ) ( ) 2

1

min ( ) ; , ,l

t t t

t p q

t

L l f x x z

Loss function

Hinge Loss

Function

Regularization

term FM ranking

function

Page 9: Exploiting Ranking Factorization Machines for Microblog Retrieval

Factorization Machines Model

Exploiting Ranking Factorization Machines for Microblog Retrieval9

0

1 1 1

ˆ( ) ,n n n

i i i j i j

i i j i

y x w w x v v x x

2

2 2

0 , ,

1 1 1 1

1ˆ( )

2

n k n n

i i i f i i f i

i f i i

y x w w x v x v x

nested

interations

factorized

parameters

factorization dimensionality

𝑂(𝑘 ∙ 𝑛)

, ,

1

, ·k

ii j f j f

f

v v v v

Page 10: Exploiting Ranking Factorization Machines for Microblog Retrieval

Learn Ranking FM

Exploiting Ranking Factorization Machines for Microblog Retrieval10

Stochastic Gradient Descent

Grid search on validation set for find the best λ

Adaptive Regularization [2]

( 1) ( ) ( ) ( ) 2

,

ˆ| : arg min (x | ), yT

t t t t

x y S

l y

( 1) ( 1) ( 1) ( ) 2

,

ˆ| : arg min (x | ), yV

t t t t

x y S

l y

Training set

Validation Set

adapt the

regularization

automatically

time-

consuming

Page 11: Exploiting Ranking Factorization Machines for Microblog Retrieval

Feature Description

Exploiting Ranking Factorization Machines for Microblog Retrieval11

Content Relevance Features (3)

Query & Tweet

BM25、TFIDF、Language Model Score

Semantic Expansion Features (3x3=9)

Query & topic info;

Expanded query & Tweet;

Expanded query & Topic info

BM25、TFIDF、Language Model Score

Quality Features (5)

mention、retweet、hashtag、link binary feature

tweet length

Page 12: Exploiting Ranking Factorization Machines for Microblog Retrieval

Experimental Setup

Exploiting Ranking Factorization Machines for Microblog Retrieval12

Dataset

TREC Tweet11 Corpus

about 2 weeks twitter data

TopicInfo Corpus

title field of link pages

TREC’11 50 queries

TREC’12 60 queries

Evaluation Metrics

P@30 & MAP

HTTP Code Status # of tweets

200 OK 8,084,724

302 Found 815,794

403 Forbidden 817,273

404 Not Found 868,667

Null Null 67,011

Searchable 8,900,518

Summary statistics of Tweet11 Corpus

HTTP Code Status # of tweets

200 OK 1,225,947

302 Found 688

403 Forbidden 5,050

404 Not Found 92,378

Null Null 265,468

Searchable 1,226,635

Summary statistics of TopicInfo Corpus

Page 13: Exploiting Ranking Factorization Machines for Microblog Retrieval

Baselines

Exploiting Ranking Factorization Machines for Microblog Retrieval13

KL2SFBLoc [3]

Expanded language model with two-stage query expansion

Perform very well in TREC’11 real time search task

hitURLrun3 [4]

Use a logistic regression model to learn a pairwise ranking for

microblog retrieval

Best Performing system in TREC’12 real time search task

RSVM_Full

Ranking SVM with linear kernel

Same feature set the Ranking FM used

Page 14: Exploiting Ranking Factorization Machines for Microblog Retrieval

Ranking FM Performance

Exploiting Ranking Factorization Machines for Microblog Retrieval14

Ranking FM

Metric KL2SFBLoc RSVM_Full hitURLrun3 RFM_FullSGD RFM_FullAR

P@30 0.2441 0.2616 0.2701 0.2808 0.2746

MAP 0.2506 0.2597 0.2642 0.2694 0.2678

4% improve on P@30

TREC’12

Best

7% improve on P@30

Page 15: Exploiting Ranking Factorization Machines for Microblog Retrieval

Feature Study

Exploiting Ranking Factorization Machines for Microblog Retrieval15

0 5 10 15 20 25 30

0.2

0.25

0.3

0.35

0.4

0.45

0.5

N

P@

N

Full

-Quality

-Document Expansion

-Query Expansion

-Content Relevance

Only Content Relevance

Ranking FM of k=3 optimized by SGD

Page 16: Exploiting Ranking Factorization Machines for Microblog Retrieval

Influence of the hyper-parameter k

Exploiting Ranking Factorization Machines for Microblog Retrieval16

0 5 10 150.255

0.26

0.265

0.27

0.275

0.28

0.285

0.29

k

P@

30

RFM_FullSGD

0 5 10 150.245

0.25

0.255

0.26

0.265

0.27

0.275

kM

AP

RFM_FullSGD

Ranking FM optimized by SGD

Page 17: Exploiting Ranking Factorization Machines for Microblog Retrieval

Stochastic gradient descent v.s.

Adaptive regularization

Exploiting Ranking Factorization Machines for Microblog Retrieval17

0 5 10 150

0.5

1

1.5

2

2.5

3x 10

4

k

Tra

inin

g tim

e (

s)

Stochastic Gradient Descent

Adaptive Regularization

Method P@5 P@10 P@30 MAP

RFM_FullSGD 0.4068 0.3695 0.2808 0.2694

RFM_FullAR 0.4034 0.3678 0.2746 0.2678

Page 18: Exploiting Ranking Factorization Machines for Microblog Retrieval

Summary

Exploiting Ranking Factorization Machines for Microblog Retrieval18

Ranking FM Framework

Pairwise approach

Use Factorization Machines as ranking function

Two optimization methods

Stochastic Gradient Descent

Adaptive Regularization

Three groups of features

Content Relevance Features

Semantic Expansion Features

Quality Features

Page 19: Exploiting Ranking Factorization Machines for Microblog Retrieval

References

Exploiting Ranking Factorization Machines for Microblog Retrieval19

[1] Iadh Ounis, Jimmy Lin, and Ian Soboroff. Overview of the TREC-

2011 MicroblogTrack. In Proceedings of TREC 2011, 2012.

[2] S. Rendle. Learning recommender systems with adaptive

regularization. In Proceedings of the fifth ACM international conference

on Web search and data mining, WSDM ’12, pages 133–142. ACM,

2012.

[3] F. Liang, R. Qiang, and J. Yang. Exploiting real-time information

retrieval in the microblogosphere. JCDL ’12, pages 267–276. ACM,

2012.

[4] Z. Han, X. Li, M. Yang, H. Qi, S. Li, and T. Zhao. Hit at TREC 2012

Microblog Track. In Proceedings of TREC 2012, 2013.

Page 20: Exploiting Ranking Factorization Machines for Microblog Retrieval

20

Exploiting Ranking Factorization

Machines for Microblog Retrieval

Exploiting Ranking Factorization Machines for Microblog Retrieval

北京大学计算机科学技术研究所Institute of Computer Science & Technology Peking University

Runwei Qiang Feng Liang Jianwu Yang

Institute of Computer Science and Technology

Peking University

CIKM 2013