home page live(www2007)

27
1 Homepage live: automatic block tracing for web personalization J. Han, D. Han, C. Lin, H.J. Zeng, Z. Che n, Y. Yu Proceedings of the 16th international conference on Wor ld Wide Web, 2007 Reporter: Shih-Feng Yang 2007/8/9

Upload: tomelf2007

Post on 22-Jun-2015

414 views

Category:

Business


0 download

TRANSCRIPT

Page 1: Home Page Live(Www2007)

1

Homepage live: automatic block tracing for web personalization

J. Han, D. Han, C. Lin, H.J. Zeng, Z. Chen, Y. YuProceedings of the 16th international conference on Wor

ld Wide Web, 2007

Reporter: Shih-Feng Yang2007/8/9

Page 2: Home Page Live(Www2007)

2

Outline

Introduction Homepage Live Tree mapping algorithm for block tracking Experiments and Analysis Conclusion

Page 3: Home Page Live(Www2007)

3

Introduction

Personalized homepage services have enabled web users to select web contents and to aggregate them in a single web page.

However, it involves manual efforts to define the content blocks and maintain the information.

Page 4: Home Page Live(Www2007)

4

Homepage Live

An application which offers “one-stop browsing” for users.

Let users to collect blocks from different web pages and organize them in a single page.

It can automatically trace and present the extracted real time content to the user.

Page 5: Home Page Live(Www2007)

5

Homepage Live

Page 6: Home Page Live(Www2007)

6

Homepage Live

Two steps of Homepage Live:1. Collecting the Blocks

Users can select the block they want by drag-and-drop with mouse.

Page 7: Home Page Live(Www2007)

7

Homepage Live

Two steps of Homepage Live:2. Tracing Web Page Blocks

Use tracing algorithm to analyze the original pages and the new pages.

It can detect the new block position in the updated pages.

Page 8: Home Page Live(Www2007)

8

Homepage Live Two steps of Homepage Live:

2. Tracing Web Page Blocks

Page 9: Home Page Live(Www2007)

9

Tree mapping algorithm for block tracking Simple methods

Direct Path Finding Record the tags on the path from the root node to the

target block, and use the path to trace the evolved block. Can not deal with the problem of block position changing.

Tag String Matching To find the evolved block, it compares the original tag

sequence in the old page with the tag sequence of every sub-tree in the new page.

Use longest common subsequences (LCS) as the similarity measure.

Page 10: Home Page Live(Www2007)

10

Tree mapping algorithm for block tracking Tree Edit Distance

Page 11: Home Page Live(Www2007)

11

Tree mapping algorithm for block tracking Tree Edit Distance

Case 1All nodes in T are not mapped to a node in T’, then

Dis(T,T’)=n(T)+n(T’)T: the original tree.

T’: the evolved tree.

n(T): the number of nodes in T.

Page 12: Home Page Live(Www2007)

12

Tree mapping algorithm for block tracking Tree Edit Distance

Case 2If r is mapped to r’

r: the root node of T.

r’: the root node of T’.

pi ,pi’ : monotonically increasing.

m: assume there are m pairs of (Spi,Spi’)

S: sub-tree

Page 13: Home Page Live(Www2007)

13

Tree mapping algorithm for block tracking Tree Edit Distance

Case 2 Standard dynamic programming algorithm can be used

to calculate the mapping with minimum edit distance. For example:

Page 14: Home Page Live(Www2007)

14

Tree mapping algorithm for block tracking Tree Edit Distance

Case 3If r is mapped to the root node of s’ of sub-tree S’ in T’

Dis(T,T’)=n(T’)-n(S’)+Dis(T,S’)

Page 15: Home Page Live(Www2007)

15

Tree mapping algorithm for block tracking Tree Edit Distance

i j

i j

Page 16: Home Page Live(Www2007)

16

Tree mapping algorithm for block tracking Fixed Sub-tree Based Tracing

1. Finding Fix Nodes Fix Node: a node with both tag and attributes immutable

in two trees. All the tags and contents of the nodes in the original

tree are indexed. Duplicated nodes in the original tree are removed. Check all nodes in the new tree sequentially and find

the fix nodes.

Page 17: Home Page Live(Www2007)

17

Tree mapping algorithm for block tracking Fixed Sub-tree Based Tracing

2. Generating the Reduced Trees Common Sub-Tree Pair

The sub-tree roots are same fix nodes. The two sub-trees contain a same set of Fix Nodes; and

none of their sub-trees contain all Fix Nodes. Minimal Common Sub-Tree

The common sub-tree with minimum size.

Page 18: Home Page Live(Www2007)

18

Tree mapping algorithm for block tracking Fixed Sub-tree Based Tracing

2. Generating the Reduced Trees Finding minimum Common Tree

?

?

Page 19: Home Page Live(Www2007)

19

Tree mapping algorithm for block tracking Fixed Sub-tree Based Tracing

2. Generating the Reduced Trees First, find the minimum common tree pair contains the

tracing blocks. Second, prune away some sub-trees that are intuitively

unnecessary ( in a rule-based fashion). For each Fix Node, all of its ancestor nodes, except the

nodes lies in the path from the root to the tracing block, should be cut off.

Page 20: Home Page Live(Www2007)

20

Tree mapping algorithm for block tracking Fixed Sub-tree Based Tracing

2. Generating the Reduced Trees

Page 21: Home Page Live(Www2007)

21

Tree mapping algorithm for block tracking Fixed Sub-tree Based Tracing

3. Mapping on the Reduced Trees After step 1. and 2. , only the remaining nodes in the

minimum common sub-tree will be taken into consideration by minimum edit distance algorithm.

Page 22: Home Page Live(Www2007)

22

Experiments and Analysis

Data Set 25-url dataset, 101 pages for each URL(30

minutes a version). Five users select their interested blocks of the first

version of 25 URLs. Then users mark out the evolved blocks in the

later 100 versions also. In total, there are 12,625 blocks marked.

Page 23: Home Page Live(Www2007)

23

Experiments and Analysis

Metrics Correct Tracing Rate (CTR)

Correct tracing count / Total tracing count Total count = 12,500

Correct Case Rate (CCR) Correct case count / Total case count Total case = 125

Page 24: Home Page Live(Www2007)

24

Experiments and Analysis

CTR and CCR

DPF (Direct Path Finding) TSM (Tag String Matching) TED (Tree Edit Distance) FSBT (Fixed Sub-tree Based Tracing)

Page 25: Home Page Live(Www2007)

25

Experiments and Analysis

Size (Node Number) and Change rate (Block Position) of the web page don't impact the algorithm much.

=> prove the scalability.

Page 26: Home Page Live(Www2007)

26

Experiments and Analysis

Computational Cost

(millisecond) (kilobyte)

Page 27: Home Page Live(Www2007)

27

Conclusion

A novel application, Homepage Live, for tracing interesting blocks on different web pages has been proposed.

Use tree edit distance to trace the block when the page is updated.

With the ability of automatic recognizing and tracing web blocks, we’re able to develop some sections or gadgets for personalized homepage.