1 quicklink selection for navigational query results deepayan chakrabarti ([email protected])...

32
1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti ([email protected]) Ravi Kumar ([email protected]) Kunal Punera ([email protected])

Post on 15-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)

1

Quicklink Selection for Navigational Query Results

Deepayan Chakrabarti ([email protected])

Ravi Kumar ([email protected])

Kunal Punera ([email protected])

Page 2: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)

2

What are quicklinks

Quicklinks

Result Website

Page 3: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)

3

Quicklinks = URLs within the search result website Enable fast navigation to important parts of the

website Which URLs should be QLs?

Quicklinks

Quicklinks

Result Website

Page 4: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)

4

Quicklink Selection

Some obvious strategies don’t work very well Top clicked URLs in search engine

URL may have low relevance in the QL context lib.utexas.edu/maps is popular for searches on “maps” and

not for searches on “Univ. of Texas” URL may be too specific:

automobiles.honda.com/civic-hybrid/exterior-photos.aspx for honda.com

URL popularity be time sensitive: nytimes.com/election-guide/2008/ for nytimes.com

Page 5: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)

5

Quicklink Selection

Some obvious strategies don’t work very wellTop clicked URLs in search engine

Top visited URLs intoolbar data May not relate to search activity:

e.g., for nytimes.com #3 is nytimes.com/mem/emailthis.html #6 is nytimes.com/auth/login #8 is nytimes.com/gst/regi.html

Page 6: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)

6

Quicklink Selection

Some obvious strategies don’t work very wellTop clicked URLs in search engine

Top visited URLs in toolbar data

Top URLs from analysis of hyperlink graph Ignores preferences of search users Toolbar data is more representative

Heavily tagged URLs (e.g., del.icio.us/digg) Low coverage: Too few websites

Page 7: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)

7

Quicklink Selection

Need a combined approach Search logs Toolbar data Web-server logs Website hyperlink graph User tags

This paper

Page 8: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)

8

Related Work

Sitemap generation [Perkowitz+/00] Detection of hard-to-find URLs [Srikant+/01] Improving website navigability [Doerr+/07] Mining Web usage patterns [Buchner/99,

Cadez+/03] BrowseRank [Liu+/08] Post-search browsing behavior [Bilenko+/08]

We focus on QLs in the context of Search

Page 9: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)

9

Outline

Motivation and Related Work Problem Formulation Proposed Solution Experiments Conclusions

Page 10: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)

10

Problem Formulation

Which k URLs should be QLs?

“The greatest good for the greatest number”

QLs save clicks Maximize the total number of clicks saved

using at most k QLs But when exactly is a click “saved”?

Page 11: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)

11

Problem Formulation

When does a QL get clicked by the user?

Graph of click trails (Toolbar data)

Say we pick this node as a QL

nasa.gov

Hubble telescope

Photos

Page 12: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)

12

Problem Formulation

Say we pick this node as a QL

Assumption:The user recognizes if SearchResult QL Destination

Graph of click trails (Toolbar data)

nasa.gov

Hubble telescope

Photos

Page 13: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)

13

Problem Formulation

Say we pick this node as a QL

(saves 1 click each)

Assumption:The user recognizes if SearchResult QL Destination

Graph of click trails (Toolbar data)

nasa.gov

Page 14: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)

14

Problem Formulation

Say we pick this node as a QL

(saves 1 click each)

(saves 2 clicks each)

(saves 0)

(saves 0)

Total savings = 1*3 + 2*2 = 7 clicks

Graph of click trails (Toolbar data)

Assumption:The user recognizes if SearchResult QL Destination

nasa.gov

Page 15: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)

15

Problem Formulation

However…

Unknown pages might become QLs

lyrics.com

A B C Z… These could become the “best” QLs

Page 16: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)

16

Problem Formulation

However… Unknown pages might become QLs Automatic-redirect pages might become QLs:

nytimes.com forces logging in aaa.com forces zipcode entry

We need QLs that are “noticeable” in a search context

Page 17: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)

17

Problem Formulation

How can we estimate noticeability? Via Search click-logs Noticeability of a URL u:

User notices a useful QL with probability α(u)

Tuning param(≈ 2)

Fraction of search clicks for u on website

Page 18: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)

18

Problem Formulation

QL1

(saves 0)

(saves 0)

QL2

# trail prob #clicks

saves 2 x α1 x 2

saves 1 x α1 x 1

saves 2 x (1-α2)α1 x 1

saves 2 x α2 x 2

Total = 5α1 + 4α2 + 2(1-α1)α2

Assumption:The user picks the best QL that he/she notices

nasa.gov

?

Page 19: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)

19

Problem Formulation

QL1

(saves 0)

(saves 0)

QL2

# trail prob #clicks

saves 2 x α1 x 2

saves 1 x α1 x 1

saves 2 x (1-α2)α1 x 1

saves 2 x α2 x 2

Total = 5α1 + 4α2 + 2(1-α1)α2

If only QL1 is perfectly noticeable (α1=1, α2=0): Total = 7 clicks (as if 1 QL only)

If both QLs are perfectly noticeable (α1=1, α2=1): Total = 9 clicks

nasa.gov

Page 20: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)

20

Problem Formulation

Which k URLs should be QLs?

Maximize the expected number of clicks saved using at most k QLs while incorporating “noticeability”

Page 21: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)

21

Outline

Motivation and Related Work

Problem Formulation Proposed Solution Experiments Conclusions

Page 22: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)

22

Algorithms

Maximize expected number of saved clicks using k QLs NP-Hard

Theorem: This objective is non-decreasing submodular

1. Non-negative

2. Adding QLs never hurts

3. “Diminishing Returns”

u

SS '

S

Marginal improvement to set S

Marginal improvement to superset S’

Page 23: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)

23

Algorithms

Greedy algorithm: Iteratively pick QLs that increase the number of saved clicks the most Within a factor (1-1/e) of OPT

[Nemhauser+/’78]

Page 24: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)

24

Algorithms

However… Inhomogeneous results: QLs for ea.com are

fifa08.ea.com battlefield.ea.com 6 webpages deep inside thesim2.ea.com

Redundant results: QLs for senate.gov include obama.senate.gov obama.senate.gov/about obama.senate.gov/contact obama.senate.gov/votes

Parent URL makes the child URLs

redundant

Two games made by EA

Page 25: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)

25

Algorithms

Both can be specified as pairwise constraints on URLs allowed to belong to a QL set

Pairwise-constrained QL selection isNP-hard.

Two-step process: Heuristically find a large subset of trails that form

a tree Enforce constraints on tree

Dynamic program optimal on tree

Page 26: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)

26

Outline

Motivation and Related Work

Problem Formulation

Proposed Solution Experiments Conclusions

Page 27: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)

27

Experiments

Baseline Methods TopClicked:

URL score = # search clicks on URL TopVisited:

URL score = # occurrences on toolbar trails PageRank:

Build a weighted graph on URLs, where weight(i,j) = # trails using the ij edge

URL score = PageRank on this graph

Page 28: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)

28

Experiments

Live Traffic dataset Computed CTRs on QLs currently displayed by

Yahoo! (1043 website subset) Measure:

Pick two equal-sizes subsets of QLs Use sum-of-scores and sum-of-CTRs to predict the

better subset Measure how often the predictions match

Page 29: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)

29

Experiments Live Traffic Data

Subset sizesFra

ctio

n o

f su

bse

t-p

airs

whe

re

pre

dic

tion

s ag

ree

with

live

tra

ffic

QL-ALG > TopVisited > PageRank > TopClicked

Page 30: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)

30

Experiments

Tree-structured trails Most dropped trails are

very short Tree-structured trails

improve accuracy

1 10 100 1000 100000

20

40

60

80

100

Length of trail

Num

ber

of t

rails

dro

pped

Live Traffic prediction quality comparison

Distribution of dropped trails

Page 31: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)

31

Outline

Motivation and Related Work

Problem Formulation

Proposed Solution

Experiments Conclusions

Page 32: 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)

32

Conclusions

Proposed a formulation for the QL selection problem Both toolbar and search logs are used intuitively

Proposed two algorithms: Greedy: (1-1/e)-optimal Tree-structured: empirically better

Improvement of 22% over competing baselines