a generalized co-hits algorithm and its application to bipartite graphs

19
1 A Generalized Co-HITS Algorithm and Its Application to Bipartite Graphs Hongbo Deng, Michael R. Lyu and Irwin King Department of Computer Science and Engineering The Chinese University of Hong Kong July 1st, 2009

Upload: heidi-charles

Post on 02-Jan-2016

29 views

Category:

Documents


5 download

DESCRIPTION

A Generalized Co-HITS Algorithm and Its Application to Bipartite Graphs. Hongbo Deng, Michael R. Lyu and Irwin King Department of Computer Science and Engineering The Chinese University of Hong Kong July 1st , 2009. Link Analysis. IR Models. for. for. - HITS. - VSM. - PageRank. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A Generalized Co-HITS Algorithm and Its Application to Bipartite Graphs

1

A Generalized Co-HITS Algorithm and Its Application to Bipartite Graphs

Hongbo Deng, Michael R. Lyu and Irwin King

Department of Computer Science and Engineering

The Chinese University of Hong Kong

July 1st, 2009

Page 2: A Generalized Co-HITS Algorithm and Its Application to Bipartite Graphs

2Hongbo Deng, Michael R. Lyu and Irwin KingDepartment of Computer Science and Engineering

The Chinese University of Hong Kong

Introduction

Many data can be modeled as bipartite graphs

Content Graph

IR Modelsfor

Link Analysisfor

- VSM

- Language Model

- etc.

- HITS

- PageRank

- etc.

Incorporate Content with Graph- Personalized PageRank (PPR)

- Linear Combination

- etc.

Semantic relationsRelevance

Page 3: A Generalized Co-HITS Algorithm and Its Application to Bipartite Graphs

3Hongbo Deng, Michael R. Lyu and Irwin KingDepartment of Computer Science and Engineering

The Chinese University of Hong Kong

d1

map www.maps.comq2 d2

q3Google map

Query URL

maps.google.comd3

www.google.comdj

www.mapquest.commapquest q1

...

google qi

...

An Illustration

PPR

mapquest

map quest

google

united states map

mapquest.com

HITS

google

mapquest

google.com

map quest

weather

Noisy link data

Lack of relevance constraints

More reasonable

mapquest

united states map

map of florida

us map

world map

Query suggestion

for query “map”:d1

map www.maps.comq2 d2

q3Google map

Query URL

maps.google.comd3

www.google.comdj

www.mapquest.commapquest q1

...

google qi

...

Page 4: A Generalized Co-HITS Algorithm and Its Application to Bipartite Graphs

4Hongbo Deng, Michael R. Lyu and Irwin KingDepartment of Computer Science and Engineering

The Chinese University of Hong Kong

Outline

Introduction Generalized Co-HITS

Preliminaries Iterative Framework Regularization Framework

Experiments Conclusion

Page 5: A Generalized Co-HITS Algorithm and Its Application to Bipartite Graphs

5Hongbo Deng, Michael R. Lyu and Irwin KingDepartment of Computer Science and Engineering

The Chinese University of Hong Kong

Preliminaries

Graph

Hidden links:

Explicit links:

Content

X Y

Page 6: A Generalized Co-HITS Algorithm and Its Application to Bipartite Graphs

6Hongbo Deng, Michael R. Lyu and Irwin KingDepartment of Computer Science and Engineering

The Chinese University of Hong Kong

Basic idea Incorporate the bipartite graph with the content

information from both sides Initialize the vertices with the relevance scores x0, y0

Propagate the scores (mutual reinforcement)

Generalized Co-HITS

Initial scores Score propagation

Page 7: A Generalized Co-HITS Algorithm and Its Application to Bipartite Graphs

7Hongbo Deng, Michael R. Lyu and Irwin KingDepartment of Computer Science and Engineering

The Chinese University of Hong Kong

Iterative framework

Generalized Co-HITS

Page 8: A Generalized Co-HITS Algorithm and Its Application to Bipartite Graphs

8Hongbo Deng, Michael R. Lyu and Irwin KingDepartment of Computer Science and Engineering

The Chinese University of Hong Kong

Iterative Regularization Framework

Consider the vertices on one side

Smoothness Fit initial scores

U

Wuu

Page 9: A Generalized Co-HITS Algorithm and Its Application to Bipartite Graphs

9Hongbo Deng, Michael R. Lyu and Irwin KingDepartment of Computer Science and Engineering

The Chinese University of Hong Kong

Generalized Co-HITS

Regularization Framework

R1R2R3

Wuu Wvv

Intuition: the highly connected vertices are most likely to have similar relevance scores.

Page 10: A Generalized Co-HITS Algorithm and Its Application to Bipartite Graphs

10Hongbo Deng, Michael R. Lyu and Irwin KingDepartment of Computer Science and Engineering

The Chinese University of Hong Kong

Generalized Co-HITS

Regularization FrameworkThe cost function: Optimization problem:

Solution:

Page 11: A Generalized Co-HITS Algorithm and Its Application to Bipartite Graphs

11Hongbo Deng, Michael R. Lyu and Irwin KingDepartment of Computer Science and Engineering

The Chinese University of Hong Kong

Application to Query-URL Bipartite Graphs

Bipartite graph construction Edge weighted by the click frequency Normalize to obtain the transition matrix

Overall Algorithm

Page 12: A Generalized Co-HITS Algorithm and Its Application to Bipartite Graphs

12Hongbo Deng, Michael R. Lyu and Irwin KingDepartment of Computer Science and Engineering

The Chinese University of Hong Kong

Outline

Introduction Preliminaries Generalized Co-HITS

Iterative Framework Regularization Framework

Experiments Conclusion

Page 13: A Generalized Co-HITS Algorithm and Its Application to Bipartite Graphs

13Hongbo Deng, Michael R. Lyu and Irwin KingDepartment of Computer Science and Engineering

The Chinese University of Hong Kong

Experimental Evaluation

Data collection AOL query log data

Cleaning the data Removing the queries that appear less than 2 times Combining the near-duplicated queries 883,913 queries and 967,174 URLs 4,900,387 edges 250,127 unique terms

Page 14: A Generalized Co-HITS Algorithm and Its Application to Bipartite Graphs

14Hongbo Deng, Michael R. Lyu and Irwin KingDepartment of Computer Science and Engineering

The Chinese University of Hong Kong

Evaluation: ODP Similarity

A simple measure of similarity among queries using ODP categories (query category)

Definition:

Example: Q1: “United States” “Regional > North America >

United States” Q2: “National Parks” “Regional > North America >

United States > Travel and Tourism > National Parks and Monuments”

Precision at rank n (P@n):

300 distinct queries

3/5

Page 15: A Generalized Co-HITS Algorithm and Its Application to Bipartite Graphs

15Hongbo Deng, Michael R. Lyu and Irwin KingDepartment of Computer Science and Engineering

The Chinese University of Hong Kong

Experimental Results

Comparison of Iterative Framework

The improvements of OSP and CoIter over the baseline (the dashed line) are promising when compared to the PPR. The initial relevance scores from both sides provide valuable information.

personalized PageRank one-step propagation general Co-HITS

Result 1:

Page 16: A Generalized Co-HITS Algorithm and Its Application to Bipartite Graphs

16Hongbo Deng, Michael R. Lyu and Irwin KingDepartment of Computer Science and Engineering

The Chinese University of Hong Kong

Experimental Results

Comparison of Regularization Frameworksingle-sided regularization double-sided regularization

SiRegu can improve the performance over the baseline. CoRegu performs better than SiRegu, which owes to the newly developed cost function R3. Moreover, CoRegu is relatively robust.

Result 2:

Page 17: A Generalized Co-HITS Algorithm and Its Application to Bipartite Graphs

17Hongbo Deng, Michael R. Lyu and Irwin KingDepartment of Computer Science and Engineering

The Chinese University of Hong Kong

Experimental Results

Detailed Results

The CoRegu-0.5 achieves the best performance. It is very essential and promising to consider the double-sided regularization framework for the bipartite graph.

Result 3:

Page 18: A Generalized Co-HITS Algorithm and Its Application to Bipartite Graphs

18Hongbo Deng, Michael R. Lyu and Irwin KingDepartment of Computer Science and Engineering

The Chinese University of Hong Kong

Conclusions

Propose the Co-HITS algorithm to incorporate the bipartite graph with the content information from both sides.

The Co-HITS algorithm is more general, which includes HITS and personalized PageRank as special cases.

The CoRegu is more robust with the newly developed cost function, which achieves the best performance with consistent and promising improvements.

Page 19: A Generalized Co-HITS Algorithm and Its Application to Bipartite Graphs

19Hongbo Deng, Michael R. Lyu and Irwin KingDepartment of Computer Science and Engineering

The Chinese University of Hong Kong

Q&A

Thanks!