home page live(www2007)
TRANSCRIPT
1
Homepage live: automatic block tracing for web personalization
J. Han, D. Han, C. Lin, H.J. Zeng, Z. Chen, Y. YuProceedings of the 16th international conference on Wor
ld Wide Web, 2007
Reporter: Shih-Feng Yang2007/8/9
2
Outline
Introduction Homepage Live Tree mapping algorithm for block tracking Experiments and Analysis Conclusion
3
Introduction
Personalized homepage services have enabled web users to select web contents and to aggregate them in a single web page.
However, it involves manual efforts to define the content blocks and maintain the information.
4
Homepage Live
An application which offers “one-stop browsing” for users.
Let users to collect blocks from different web pages and organize them in a single page.
It can automatically trace and present the extracted real time content to the user.
5
Homepage Live
6
Homepage Live
Two steps of Homepage Live:1. Collecting the Blocks
Users can select the block they want by drag-and-drop with mouse.
7
Homepage Live
Two steps of Homepage Live:2. Tracing Web Page Blocks
Use tracing algorithm to analyze the original pages and the new pages.
It can detect the new block position in the updated pages.
8
Homepage Live Two steps of Homepage Live:
2. Tracing Web Page Blocks
9
Tree mapping algorithm for block tracking Simple methods
Direct Path Finding Record the tags on the path from the root node to the
target block, and use the path to trace the evolved block. Can not deal with the problem of block position changing.
Tag String Matching To find the evolved block, it compares the original tag
sequence in the old page with the tag sequence of every sub-tree in the new page.
Use longest common subsequences (LCS) as the similarity measure.
10
Tree mapping algorithm for block tracking Tree Edit Distance
11
Tree mapping algorithm for block tracking Tree Edit Distance
Case 1All nodes in T are not mapped to a node in T’, then
Dis(T,T’)=n(T)+n(T’)T: the original tree.
T’: the evolved tree.
n(T): the number of nodes in T.
12
Tree mapping algorithm for block tracking Tree Edit Distance
Case 2If r is mapped to r’
r: the root node of T.
r’: the root node of T’.
pi ,pi’ : monotonically increasing.
m: assume there are m pairs of (Spi,Spi’)
S: sub-tree
13
Tree mapping algorithm for block tracking Tree Edit Distance
Case 2 Standard dynamic programming algorithm can be used
to calculate the mapping with minimum edit distance. For example:
14
Tree mapping algorithm for block tracking Tree Edit Distance
Case 3If r is mapped to the root node of s’ of sub-tree S’ in T’
Dis(T,T’)=n(T’)-n(S’)+Dis(T,S’)
15
Tree mapping algorithm for block tracking Tree Edit Distance
i j
i j
16
Tree mapping algorithm for block tracking Fixed Sub-tree Based Tracing
1. Finding Fix Nodes Fix Node: a node with both tag and attributes immutable
in two trees. All the tags and contents of the nodes in the original
tree are indexed. Duplicated nodes in the original tree are removed. Check all nodes in the new tree sequentially and find
the fix nodes.
17
Tree mapping algorithm for block tracking Fixed Sub-tree Based Tracing
2. Generating the Reduced Trees Common Sub-Tree Pair
The sub-tree roots are same fix nodes. The two sub-trees contain a same set of Fix Nodes; and
none of their sub-trees contain all Fix Nodes. Minimal Common Sub-Tree
The common sub-tree with minimum size.
18
Tree mapping algorithm for block tracking Fixed Sub-tree Based Tracing
2. Generating the Reduced Trees Finding minimum Common Tree
?
?
19
Tree mapping algorithm for block tracking Fixed Sub-tree Based Tracing
2. Generating the Reduced Trees First, find the minimum common tree pair contains the
tracing blocks. Second, prune away some sub-trees that are intuitively
unnecessary ( in a rule-based fashion). For each Fix Node, all of its ancestor nodes, except the
nodes lies in the path from the root to the tracing block, should be cut off.
20
Tree mapping algorithm for block tracking Fixed Sub-tree Based Tracing
2. Generating the Reduced Trees
21
Tree mapping algorithm for block tracking Fixed Sub-tree Based Tracing
3. Mapping on the Reduced Trees After step 1. and 2. , only the remaining nodes in the
minimum common sub-tree will be taken into consideration by minimum edit distance algorithm.
22
Experiments and Analysis
Data Set 25-url dataset, 101 pages for each URL(30
minutes a version). Five users select their interested blocks of the first
version of 25 URLs. Then users mark out the evolved blocks in the
later 100 versions also. In total, there are 12,625 blocks marked.
23
Experiments and Analysis
Metrics Correct Tracing Rate (CTR)
Correct tracing count / Total tracing count Total count = 12,500
Correct Case Rate (CCR) Correct case count / Total case count Total case = 125
24
Experiments and Analysis
CTR and CCR
DPF (Direct Path Finding) TSM (Tag String Matching) TED (Tree Edit Distance) FSBT (Fixed Sub-tree Based Tracing)
25
Experiments and Analysis
Size (Node Number) and Change rate (Block Position) of the web page don't impact the algorithm much.
=> prove the scalability.
26
Experiments and Analysis
Computational Cost
(millisecond) (kilobyte)
27
Conclusion
A novel application, Homepage Live, for tracing interesting blocks on different web pages has been proposed.
Use tree edit distance to trace the block when the page is updated.
With the ability of automatic recognizing and tracing web blocks, we’re able to develop some sections or gadgets for personalized homepage.