can social bookmarking improve web search?cdn.paulheymann.com/stanford/wsdm_talk_20080212.pdf ·...

23
Can Social Bookmarking Improve Web Search? Paul Heymann, Georgia Koutrika, and Hector Garcia-Molina Department of Computer Science Stanford University February 12th, 2008

Upload: others

Post on 15-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Can Social Bookmarking Improve Web Search?cdn.paulheymann.com/stanford/wsdm_talk_20080212.pdf · 12/02/2008  · Can Social Bookmarking Improve Web Search? Paul Heymann, Georgia Koutrika,

Can Social BookmarkingImprove Web Search?

Paul Heymann, Georgia Koutrika, and Hector Garcia-MolinaDepartment of Computer Science

Stanford University

February 12th, 2008

Page 2: Can Social Bookmarking Improve Web Search?cdn.paulheymann.com/stanford/wsdm_talk_20080212.pdf · 12/02/2008  · Can Social Bookmarking Improve Web Search? Paul Heymann, Georgia Koutrika,

Outline

Introduction

Problem Statement

Data Gathering Methodology

Analysis

Conclusions

Page 3: Can Social Bookmarking Improve Web Search?cdn.paulheymann.com/stanford/wsdm_talk_20080212.pdf · 12/02/2008  · Can Social Bookmarking Improve Web Search? Paul Heymann, Georgia Koutrika,

YouTube

Page 4: Can Social Bookmarking Improve Web Search?cdn.paulheymann.com/stanford/wsdm_talk_20080212.pdf · 12/02/2008  · Can Social Bookmarking Improve Web Search? Paul Heymann, Georgia Koutrika,

Amazon

Page 5: Can Social Bookmarking Improve Web Search?cdn.paulheymann.com/stanford/wsdm_talk_20080212.pdf · 12/02/2008  · Can Social Bookmarking Improve Web Search? Paul Heymann, Georgia Koutrika,

del.icio.us

Page 6: Can Social Bookmarking Improve Web Search?cdn.paulheymann.com/stanford/wsdm_talk_20080212.pdf · 12/02/2008  · Can Social Bookmarking Improve Web Search? Paul Heymann, Georgia Koutrika,

Outline

Introduction

Problem Statement

Data Gathering Methodology

Analysis

Conclusions

Page 7: Can Social Bookmarking Improve Web Search?cdn.paulheymann.com/stanford/wsdm_talk_20080212.pdf · 12/02/2008  · Can Social Bookmarking Improve Web Search? Paul Heymann, Georgia Koutrika,

Problem Statement

Can social bookmarkingimprove web search?

Page 8: Can Social Bookmarking Improve Web Search?cdn.paulheymann.com/stanford/wsdm_talk_20080212.pdf · 12/02/2008  · Can Social Bookmarking Improve Web Search? Paul Heymann, Georgia Koutrika,

Subproblems

SubproblemsAre there “enough” URLs?Are there “enough” tags?Are the URLs valuable?Are the tags redundant?

Page 9: Can Social Bookmarking Improve Web Search?cdn.paulheymann.com/stanford/wsdm_talk_20080212.pdf · 12/02/2008  · Can Social Bookmarking Improve Web Search? Paul Heymann, Georgia Koutrika,

Tags versus Other Content

Page 10: Can Social Bookmarking Improve Web Search?cdn.paulheymann.com/stanford/wsdm_talk_20080212.pdf · 12/02/2008  · Can Social Bookmarking Improve Web Search? Paul Heymann, Georgia Koutrika,

Outline

Introduction

Problem Statement

Data Gathering Methodology

Analysis

Conclusions

Page 11: Can Social Bookmarking Improve Web Search?cdn.paulheymann.com/stanford/wsdm_talk_20080212.pdf · 12/02/2008  · Can Social Bookmarking Improve Web Search? Paul Heymann, Georgia Koutrika,

del.icio.us posts

Bookmarks/Posts Triplespaul: news, uk→ bbc.co.uk (paul, news, bbc.co.uk)

08:33:25 (paul, uk, bbc.co.uk)

mary: recipes, food→ food.com (mary, recipes, food.com)08:33:23 (mary, food, food.com)

dave: tv, cnn, news→ cnn.com (dave, tv, cnn.com)08:33:21 (dave, cnn, cnn.com)

(dave, news, cnn.com)

Page 12: Can Social Bookmarking Improve Web Search?cdn.paulheymann.com/stanford/wsdm_talk_20080212.pdf · 12/02/2008  · Can Social Bookmarking Improve Web Search? Paul Heymann, Georgia Koutrika,

Realtime Web Crawling

Page 13: Can Social Bookmarking Improve Web Search?cdn.paulheymann.com/stanford/wsdm_talk_20080212.pdf · 12/02/2008  · Can Social Bookmarking Improve Web Search? Paul Heymann, Georgia Koutrika,

Outline

Introduction

Problem Statement

Data Gathering Methodology

Analysis

Conclusions

Page 14: Can Social Bookmarking Improve Web Search?cdn.paulheymann.com/stanford/wsdm_talk_20080212.pdf · 12/02/2008  · Can Social Bookmarking Improve Web Search? Paul Heymann, Georgia Koutrika,

Size and Growth≈ 120 thousand (≈ 105) posts/day

(versus ≈ 106 blog posts/day)60–150 million posts

12–75 million (≈ 107–108) unique URLs(versus ≈ 109–1011 total URLs)

Date

Est

imat

ed N

umbe

r of

Pos

ts

August 1, 2005 December 9, 2005 August 16, 2006

030

000

6000

090

000

1200

00

Date

Est

imat

ed N

umbe

r of

Pos

ts

November 11, 2006 July 5, 2007

3000

060

000

9000

012

0000

1500

00

Page 15: Can Social Bookmarking Improve Web Search?cdn.paulheymann.com/stanford/wsdm_talk_20080212.pdf · 12/02/2008  · Can Social Bookmarking Improve Web Search? Paul Heymann, Georgia Koutrika,

URL Indexing and Age

Found Initially 57.5%Indexed Within 4 Weeks 12.75%Indexed Within 6 Months 12.75%

Never Indexed 17%

Of the 57.5% found initially, modification time at time of post:

Page 16: Can Social Bookmarking Improve Web Search?cdn.paulheymann.com/stanford/wsdm_talk_20080212.pdf · 12/02/2008  · Can Social Bookmarking Improve Web Search? Paul Heymann, Georgia Koutrika,

URL Indexing and Age

Found Initially 57.5%Indexed Within 4 Weeks 12.75%Indexed Within 6 Months 12.75%

Never Indexed 17%

Of the 57.5% found initially, modification time at time of post:

Page 17: Can Social Bookmarking Improve Web Search?cdn.paulheymann.com/stanford/wsdm_talk_20080212.pdf · 12/02/2008  · Can Social Bookmarking Improve Web Search? Paul Heymann, Georgia Koutrika,

Tagging Caveats (“The Tagging 6”)

1. Title (16%)Examples: “oil”, “prices”

2. Whole Domain (20%)Examples: “news”, “cnn”

3. Page Text (50%)Example: “singapore”

4. Extended Text (80%)Example: “inflation”

5. Irrelevant (7%)Example: “stanford”

6. Subjective (<5%)Example: “funny”

Page 18: Can Social Bookmarking Improve Web Search?cdn.paulheymann.com/stanford/wsdm_talk_20080212.pdf · 12/02/2008  · Can Social Bookmarking Improve Web Search? Paul Heymann, Georgia Koutrika,

Tags versus Other Content

Page 19: Can Social Bookmarking Improve Web Search?cdn.paulheymann.com/stanford/wsdm_talk_20080212.pdf · 12/02/2008  · Can Social Bookmarking Improve Web Search? Paul Heymann, Georgia Koutrika,

Outline

Introduction

Problem Statement

Data Gathering Methodology

Analysis

Conclusions

Page 20: Can Social Bookmarking Improve Web Search?cdn.paulheymann.com/stanford/wsdm_talk_20080212.pdf · 12/02/2008  · Can Social Bookmarking Improve Web Search? Paul Heymann, Georgia Koutrika,

Conclusions

1. Social bookmarking URLs are new andrecent, though many tags may be redundant(given title, text, domains).

2. Social bookmarking is a large phenomenon,but not nearly as large as the web.

3. Despite this, relevant URLs are wellrepresented, and popular tags overlap withpopular queries.

Questions?Check out the full paper athttp://dbpubs.stanford.edu/

or in the proceedings!

Page 21: Can Social Bookmarking Improve Web Search?cdn.paulheymann.com/stanford/wsdm_talk_20080212.pdf · 12/02/2008  · Can Social Bookmarking Improve Web Search? Paul Heymann, Georgia Koutrika,

Conclusions

1. Social bookmarking URLs are new andrecent, though many tags may be redundant(given title, text, domains).

2. Social bookmarking is a large phenomenon,but not nearly as large as the web.

3. Despite this, relevant URLs are wellrepresented, and popular tags overlap withpopular queries.

Questions?Check out the full paper athttp://dbpubs.stanford.edu/

or in the proceedings!

Page 22: Can Social Bookmarking Improve Web Search?cdn.paulheymann.com/stanford/wsdm_talk_20080212.pdf · 12/02/2008  · Can Social Bookmarking Improve Web Search? Paul Heymann, Georgia Koutrika,

Conclusions

1. Social bookmarking URLs are new andrecent, though many tags may be redundant(given title, text, domains).

2. Social bookmarking is a large phenomenon,but not nearly as large as the web.

3. Despite this, relevant URLs are wellrepresented, and popular tags overlap withpopular queries.

Questions?Check out the full paper athttp://dbpubs.stanford.edu/

or in the proceedings!

Page 23: Can Social Bookmarking Improve Web Search?cdn.paulheymann.com/stanford/wsdm_talk_20080212.pdf · 12/02/2008  · Can Social Bookmarking Improve Web Search? Paul Heymann, Georgia Koutrika,

Conclusions

1. Social bookmarking URLs are new andrecent, though many tags may be redundant(given title, text, domains).

2. Social bookmarking is a large phenomenon,but not nearly as large as the web.

3. Despite this, relevant URLs are wellrepresented, and popular tags overlap withpopular queries.

Questions?Check out the full paper athttp://dbpubs.stanford.edu/

or in the proceedings!