context-sensitive ranking

36
SIGMOD 2006 Context-sensitive ranking Rakesh Agrawal Microsoft Search Labs Ralf Rantzau IBM Silicon Valley Lab Evimaria Terzi University of Helsinki & Microsoft Search Labs Work done largely while the authors were in IBM Almaden

Upload: chi

Post on 19-Jan-2016

60 views

Category:

Documents


0 download

DESCRIPTION

Context-sensitive ranking. Rakesh AgrawalMicrosoft Search Labs Ralf RantzauIBM Silicon Valley Lab Evimaria TerziUniversity of Helsinki & Microsoft Search Labs. Work done largely while the authors were in IBM Almaden. The curse of abundance: Too many data and too many answers. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Context-sensitive ranking

SIGMOD 2006

Context-sensitive ranking

Rakesh Agrawal Microsoft Search Labs

Ralf Rantzau IBM Silicon Valley Lab

Evimaria Terzi University of Helsinki &Microsoft Search Labs

Work done largely while the authors were in IBM Almaden

Page 2: Context-sensitive ranking

SIGMOD 2006

The curse of abundance:Too many data and too many answers

• Query shopping.com for a digital camera:

• Query Froogle for a tennis racquet:

Page 3: Context-sensitive ranking

SIGMOD 2006

Ranking query results

• Algorithms for ranking web pages have been quite successful ([BP’98,Kleinberg98])

– Key idea: Exploit the graph of hyperlinks between web pages

• Can we take similar approach for ranking database query results?

– Need for a graph structure that accurately describes the relationships between tuples in the database

- Past attempts: schema and key constraints or queries [BHP’04, BHNCS’02, GMT’04]

But are these graphs natural or do they reflect design optimization decisions?

Page 4: Context-sensitive ranking

SIGMOD 2006

Using preferences to induce a graph of tuples

Genre (G)

Actor (A) Title (T) Language

t1 Drama Kidman Birth English

t2 Drama Cruz Vanilla Sky English

t3 Sci-Fi Reeves Matrix English

t4 Comedy Cruz Sin noticias de Dios

Spanish

t5 Comedy Aniston Rumor has it… English

• Drama > Sci-Fi• Kidman > Reeves • Matrix > Birth

t1>t3 and t2>t3

t1>t3

t3>t1

t1 t2

t3

[ Preferences are predicates of the form “X=x1 > X=x2” ]

Page 5: Context-sensitive ranking

SIGMOD 2006

Augment preferences with context

Genre (G) Actor (A) Title (T) Language (L)

t1 Drama Kidman Birth English

t2 Drama Cruz Vanilla Sky English

t3 Sci-Fi Reeves Matrix English

t4 Comedy Cruz Sin noticias de Dios Spanish

t5 Comedy Aniston Rumor has it English

– in general (*)• English > Spanish | *

– but in the context of Comedies• Spanish > English| Comedies

[ Contexts are predicates of the form “Y=a” ]

Page 6: Context-sensitive ranking

SIGMOD 2006

Preferences in the past

• Preferences expressed via a numeric score [AW’00,KI’04,KI’05]– Nicole Kidman : 0.9– Penelope Cruz : 0.4– Dramas : 0.8– Comedies : 0.3

• Pairwise preferences in ML literature [CSS’97]• Preferences as partial orders [Kieβling’02]• Preferences as first-order formulas [Chomiki’03]

Page 7: Context-sensitive ranking

SIGMOD 2006

Contextual preferences

Genre (G)

Actor (A)

Title (T) Language (L)

t1 Drama Kidman Birth English

t2 Drama Cruz Vanilla Sky English

t3 Sci-Fi Reeves Matrix English

t4 Comedy Cruz Sin noticias de Dios

Spanish

t5 Comedy Aniston Rumor has it… English

• P1={G=Drama > G=Sci-Fi | L=English}

• P2={A=Kidman > A=Reeves | L = English}

• P3={T=matrix > T=Birth | L=English }

t1>t3|En and t2>t3|En

t1>t3|En

t3>t1|En

Genre (G)

Actor (A)

Title (T) Language (L)

t1 Drama Kidman Birth English

t2 Drama Cruz Vanilla Sky English

t3 Sci-Fi Reeves Matrix English

t4 Comedy Cruz Sin noticias de Dios

Spanish

t5 Comedy Aniston Rumor has it… English

t1 t2

t3

2/3

1/3

1

1/2

1/2t1 t2

t3

Page 8: Context-sensitive ranking

SIGMOD 2006

Obtaining preferences

• Users provide preferences voluntarily – in the same way users rate products and services

• Preferences can be automatically collected via browser plug-ins or taskbars (with user permission)

• Preferences can be learned from past data

• Preferences can also be learned from the data (e.g., using association-rule mining)

Preferences are obtained from various sources and can contain cycles and contradictions, which are resolved

democratically

Page 9: Context-sensitive ranking

SIGMOD 2006

Overview

Question:How to incorporate users preferences when ranking query results?

Approach:• Accumulate contextual preferences of the form i1>i2|X

• Order the answer tuples such that the preferences are maximally respected, giving higher weight to those preferences whose contexts have closer match to the query

Page 10: Context-sensitive ranking

SIGMOD 2006

Issues

• How to define similarity between a query and a context ? – See paper for the distance function.

• Can we create orders in an offline step and use their information at query time ?

• Should we save all orders?

• How to combine the saved orders while answering queries ?

Page 11: Context-sensitive ranking

SIGMOD 2006

Problem decomposition

[Problem 1]: For every context X build an order τX (Ordering)

[Problem 2]: Given a set of orders Tm = {τ1,…, τm} find ℓ representative orders Tℓ (ClusterOrders)

• Assign each of the input orders to one of the representatives (the closest)

• Associate with each representative σ a set of contexts Yσ

[Problem 3]: Provide top-k results for the query Q– respecting the representative orders and– weight respect according to the similarity between

query and contexts (Querying)

Page 12: Context-sensitive ranking

SIGMOD 2006

Problem 1: The Ordering problem

For a given context X and a set of preferences PX over the tuples D={t1,…,tn} find an ordering τ of D such that

)Agree(maxarg ' Xτ',P

t1 t2

t3

1/2

1/2

2/3

1/3

1

t1 t2

t3

t2

t1

t3

Agree = 1 +1/2 = 2/3 = 13/6

Page 13: Context-sensitive ranking

SIGMOD 2006

Problem 2: The ClusterOrders problem

Given m orders Tm={τ1,…,τm} , each corresponding to a single concept Xi, find ℓ representative orders Tℓ such that cost(Tℓ) is minimized where

and

We use the standard Spearman footrule and Kendall tau distances for comparing orderings

mT

TdT

),()(Cost

),(min),( dTdmT

Page 14: Context-sensitive ranking

SIGMOD 2006

The ClusterOrders problem: Example

a

b

c

d

e

f

a

b

c

d

e

f

a

b

c

d

e

f

f

e

d

c

b

a

f

e

d

c

b

a

a

b

c

d

e

f

f

e

d

c

b

a

Cost(τ1) = 2

0 1 1 0 1

Cost(τ2) = 1

Cost(τ1, τ2) = 2+1=3

Page 15: Context-sensitive ranking

SIGMOD 2006

Problem 3: The Querying problem

Provide top-k results for query Q respecting the representative orders and weighting respect using the corresponding set of contexts

Page 16: Context-sensitive ranking

SIGMOD 2006

Problem decomposition

[Problem 1]: For every context X build an order τX (Ordering)

[Problem 2]: Given a set of orders Tm = {τ1,…, τm} find ℓ representative orders Tℓ (ClusterOrders)

• Assign each of the input orders to one of the representatives (the closest)

• Associate with each representative σ a set of contexts Yσ

[Problem 3]: Provide top-k results for the query Q– respecting the representative orders and– weight respect according to the similarity between

query and contexts (Querying)

Page 17: Context-sensitive ranking

SIGMOD 2006

Constructing orders from preferences [Problem1]

• Problem is NP-hard; need for heuristics • PickPerm algorithm : pick a random permutation, inverse it

and pick the best of the two

t1 t2

t3

1/2

1/2

2/3

1/3

1

t1 t2

t3

t2

t3

t1

A = 11/6

t1

t3

t2

A = 5/6

t2

t3

t1

[ Inspired by the 2-approximation algorithm for finding the maximum acyclic subgraph of a given graph ]

Page 18: Context-sensitive ranking

SIGMOD 2006

Greedy algorithm [CSS’97]

• At the i-th iteration pick the i-th element of the output permutation

• At each iteration pick the tuple t with the highest s_val(t) = OutDegree(t)-InDegree(t)

in the remaining preference graph

t1 t2

t3

1/2

1/2

2/3

1/3

1

t1 t2

t3

1/3

2/3

1/3

t1

t3

1

-4/3

t2

1/3

-1/3

t2

t1

t2

t1

t3

Page 19: Context-sensitive ranking

SIGMOD 2006

MC-algorithm

• Reverse the directions of the edges on the preference graph

• Run a random walk (with random restarts) on the reversed graph

• Rank according to the stationary distribution

Page 20: Context-sensitive ranking

SIGMOD 2006

Performance

• Data generation– Fix an order on the tuples– Generate preferences that

respect this order– Pc: the probability that a

preference is generated between a pair of tuples

• Observations– For small pc values more

orders are compatible, all algorithms are good

– For large pc values MC and Greedy find the optimal order

Page 21: Context-sensitive ranking

SIGMOD 2006

Problem decomposition

[Problem 1]: For every context X build an order τX (Ordering)

[Problem 2]: Given a set of orders Tm = {τ1,…, τm} find ℓ representative orders Tℓ (ClusterOrders)

• Assign each of the input orders to one of the representatives (the closest)

• Associate with each representative σ a set of contexts Yσ

[Problem 3]: Provide top-k results for the query Q– respecting the representative orders and– weight respect according to the similarity between

query and contexts (Querying)

Page 22: Context-sensitive ranking

SIGMOD 2006

Reducing the number of orders [Problem 2]

• Finding ℓ representative orders is NP-hard

• Finding ℓ orders from the input ones (good approximation, but still hard)

• Need for heuristics

• Greedy algorithm– Always pick the order (from the input) that introduces the

minimum cost

• Furthest algorithm– Start by picking a random order τ and add it in the output set

of orders Tℓ

– For ℓ-1 iterations pick the order that is furthest away from the orders already in Tℓ

Page 23: Context-sensitive ranking

SIGMOD 2006

Refine the representative orders

• Given the set of representative orders Tℓ, assign each input order τЄTm to its closest representative in Tℓ. (partition Tm into ℓ partitions)*

– Discrete refinement: For each partition pick the best representative of the partition

– Continuous refinement: ([DKNS’01]) For each partition find the best representative of the partition

*Notice the resemblance between this problem and Catalog Segmentation problem by [KPR’04]

Page 24: Context-sensitive ranking

SIGMOD 2006

Performance

• Data generation– Fix ℓ underlying orders T– Generate other orders

from T by picking an order in T and adding noise (swaps)

– Compute the cost of the solution wrt to the ground truth• Observations

– Without refinements: Greedy performs steadily better than Furthest

– With refinements: Both algorithms are equally good

– The groupings are equivalent

Page 25: Context-sensitive ranking

SIGMOD 2006

Problem decomposition

[Problem 1]: For every context X build an order τX (Ordering)

[Problem 2]: Given a set of orders Tm = {τ1,…, τm} find ℓ representative orders Tℓ (ClusterOrders)

• Assign each of the input orders to one of the representatives (the closest)

• Associate with each representative σ a set of contexts Yσ

[Problem 3]: Provide top-k results for the query Q– respecting the representative orders and– weight respect according to the similarity between

query and contexts (Querying)

Page 26: Context-sensitive ranking

SIGMOD 2006

Problem 3: The Querying problem

• Use variation of the TA algorithms [FLN’02, FKS’03]– Assume k = 2 and query Q such that:

• sim(Q,Y1) = 0.5, sim(Q,Y2) = 0.3, sim(Q,Y3)=0.1

Y1,T1

t1 5

t2 4

t3 3

t4 2

T5 1

Y2,T2

t2 5

t3 4

t1 3

t4 2

t5 1

Y3,T3

t4 5

t3 4

t1 3

t5 2

t2 1

0.5 0.3 0.1

Page 27: Context-sensitive ranking

SIGMOD 2006

Problem 3: The Querying problem

1. At each sequential accessa. Set the threshold TH to be the aggregate of the

scores seen in this access

TH =0.5*5+0.3*5+0.1*5=4.5

Y1,T1

t1 5

t2 4

t3 3

t4 2

T5 1

Y2,T2

t2 5

t3 4

t1 3

t4 2

t5 1

Y3,T3

t4 5

t3 4

t1 3

t5 2

t2 1

0.5 0.3 0.1

Page 28: Context-sensitive ranking

SIGMOD 2006

Problem 3: The Querying problem

1. At each sequential accessb. Do random accesses and compute the score of the

objects seen

TH =0.5*5+0.3*5+0.1*5=4.5

Y1,T1

t1 5

t2 4

t3 3

t4 2

T5 1

Y2,T2

t2 5

t3 4

t1 3

t4 2

t5 1

Y3,T3

t4 5

t3 4

t1 3

t5 2

t2 1

t1 3.7

t2 3.6

t4 2.1

0.5 0.3 0.1

Page 29: Context-sensitive ranking

SIGMOD 2006

Problem 3: The Querying problem

1. At each sequential accessb. Do random accesses and compute the score of the

objects seen

TH =0.5*5+0.3*5+0.1*5=4.5

Y1,T1

t1 5

t2 4

t3 3

t4 2

T5 1

Y2,T2

t2 5

t3 4

t1 3

t4 2

t5 1

Y3,T3

t4 5

t3 4

t1 3

t5 2

t2 1

t1 3.7

t2 3.6

0.5 0.3 0.1

Page 30: Context-sensitive ranking

SIGMOD 2006

Problem 3: The Querying problem

1. At each sequential accessc. Maintain a list of the top-k objects seen so far

TH =0.5*5+0.3*5+0.1*5=4.5

Y1,T1

t1 5

t2 4

t3 3

t4 2

T5 1

Y2,T2

t2 5

t3 4

t1 3

t4 2

t5 1

Y3,T3

t4 5

t3 4

t1 3

t5 2

t2 1

t1 3.7

t2 3.6

0.5 0.3 0.1

Page 31: Context-sensitive ranking

SIGMOD 2006

Problem 3: The Querying problem

1. At each sequential accessd. When the scores of the top-k are greater or equal

to the threshold, stop

TH =0.5*4+0.3*4+0.1*4=3.6

Y1,T1

t1 5

t2 4

t3 3

t4 2

T5 1

Y2,T2

t2 5

t3 4

t1 3

t4 2

t5 1

Y3,T3

t4 5

t3 4

t1 3

t5 2

t2 1

t1 3.7

t2 3.6

0.5 0.3 0.1

Page 32: Context-sensitive ranking

SIGMOD 2006

Accuracy of top-k results

• IMDB dataset– Automatically generate

preferences via association-rule mining:‘A1=a’ > ‘A1=b’ |X if conf(Xa)>conf(Xb)

– Solk: top-k results obtained after clustering

– Gk: top-k results without clustering

|Sol|

|Sol|),,Accuracy(

kk

kk

G

GkGSol

Page 33: Context-sensitive ranking

SIGMOD 2006

Accuracy of top-k results

Page 34: Context-sensitive ranking

SIGMOD 2006

Recap

• Notion of contextual preferences

• Use of contextual preferences to order database results

• Use of association rules to obtain contextual preferences

• Experimental validation of the effectiveness of the proposed techniques using both synthetic and real data

Page 35: Context-sensitive ranking

SIGMOD 2006

Conclusions and future work

• The framework of contextual preferences is both intuitive and practical

• The framework is easily extended to accommodate for top-k lists and bucket orders

• Scalability of the algorithms needs further investigation

Page 36: Context-sensitive ranking

SIGMOD 2006

Questions?