1 further investigations on heat diffusion models haixuan yang supervisors: prof irwin king and...
DESCRIPTION
3 Introduction PHDCVolume-based HDM DiffusionRank HDM on Graphs Inside Improvement Input Improvement Outside Improvement PHDC: the model proposed last yearTRANSCRIPT
![Page 1: 1 Further Investigations on Heat Diffusion Models Haixuan Yang Supervisors: Prof Irwin King and Prof…](https://reader035.vdocuments.net/reader035/viewer/2022062601/5a4d1bf67f8b9ab0599e8cd1/html5/thumbnails/1.jpg)
1
Further Investigations on Heat Diffusion Models
Haixuan Yang
Supervisors: Prof Irwin King and Prof Michael R. Lyu
Term Presentation 2006
![Page 2: 1 Further Investigations on Heat Diffusion Models Haixuan Yang Supervisors: Prof Irwin King and Prof…](https://reader035.vdocuments.net/reader035/viewer/2022062601/5a4d1bf67f8b9ab0599e8cd1/html5/thumbnails/2.jpg)
2
Outline Introduction Input Improvement – Three candidate graphs Outside Improvement – DiffusionRank Inside Improvement – Volume-based heat difusion model Summary
![Page 3: 1 Further Investigations on Heat Diffusion Models Haixuan Yang Supervisors: Prof Irwin King and Prof…](https://reader035.vdocuments.net/reader035/viewer/2022062601/5a4d1bf67f8b9ab0599e8cd1/html5/thumbnails/3.jpg)
3
Introduction
PHDC Volume-based HDM
DiffusionRank
HDM on Graphs
Inside Improvement
Input Improvement
Outside Improvement
PHDC: the model proposed last year
![Page 4: 1 Further Investigations on Heat Diffusion Models Haixuan Yang Supervisors: Prof Irwin King and Prof…](https://reader035.vdocuments.net/reader035/viewer/2022062601/5a4d1bf67f8b9ab0599e8cd1/html5/thumbnails/4.jpg)
4
PHDC PHDC is a classifier motivated by
Tenenbaum et al (Science 2000) Approximate the manifold by a KNN graph Reduce dimension by shortest paths
Kondor & Lafferty (NIPS 2002) Construct a diffusion kernel on an undirected graph Apply to a large margin classifier
Belkin & Niyogi (Neural Computation 2003) Approximate the manifold by a KNN graph Reduce dimension by heat kernels
Lafferty & Kondor (JMLR 2005) Construct a diffusion kernel on a special manifold Apply to SVM
![Page 5: 1 Further Investigations on Heat Diffusion Models Haixuan Yang Supervisors: Prof Irwin King and Prof…](https://reader035.vdocuments.net/reader035/viewer/2022062601/5a4d1bf67f8b9ab0599e8cd1/html5/thumbnails/5.jpg)
5
PHDC Ideas we inherit
Local information relatively accurate in a nonlinear manifold.
Heat diffusion on a manifold a generalization of the Gaussian density from Euclidean space
to manifold. heat diffuses in the same way as Gaussian density in the ideal
case when the manifold is the Euclidean space. Ideas we think differently
Establish the heat diffusion equation on a weighted directed graph. The broader settings enable its application on ranking on the
Web pages. Construct a classifier by the solution directly.
![Page 6: 1 Further Investigations on Heat Diffusion Models Haixuan Yang Supervisors: Prof Irwin King and Prof…](https://reader035.vdocuments.net/reader035/viewer/2022062601/5a4d1bf67f8b9ab0599e8cd1/html5/thumbnails/6.jpg)
6
Heat Diffusion Model in PDHC Notations
Solution
Classifier G is the KNN Graph: Connect a directed edge (j,i) if j is one of the
K nearest neighbors of i. For each class k, f(i,0) is set as 1 if data is labeled as k and 0
otherwise. Assign data j to a label q if j receives most heat from data in class
q.
. at time nodeat heat the:),(
matrix. weight the:)(}. to from edgean is thereif :),{(
},,,{ wheregraph, weighteddirectedgiven a :),,(
titif
wW=jijiE=
.n...21VWEVG
ij
)0()0()( fefetf HtH
![Page 7: 1 Further Investigations on Heat Diffusion Models Haixuan Yang Supervisors: Prof Irwin King and Prof…](https://reader035.vdocuments.net/reader035/viewer/2022062601/5a4d1bf67f8b9ab0599e8cd1/html5/thumbnails/7.jpg)
7
Input Improvement Three candidate graphs
KNN Graph Connect points j and i from j to i if j is one of
the K nearest neighbors of i, measured by the Euclidean distance.
SKNN-Graph Choose the smallest K*n/2 undirected edges,
which amounts to K*n directed edges. Minimum Spanning Tree
Choose the subgraph such that It is a tree connecting all vertices; the sum of
weights is minimum among all such trees.
![Page 8: 1 Further Investigations on Heat Diffusion Models Haixuan Yang Supervisors: Prof Irwin King and Prof…](https://reader035.vdocuments.net/reader035/viewer/2022062601/5a4d1bf67f8b9ab0599e8cd1/html5/thumbnails/8.jpg)
8
Input Improvement Illustration
Manifold KNN Graph SKNN-Graph Minimum Spanning Tree
![Page 9: 1 Further Investigations on Heat Diffusion Models Haixuan Yang Supervisors: Prof Irwin King and Prof…](https://reader035.vdocuments.net/reader035/viewer/2022062601/5a4d1bf67f8b9ab0599e8cd1/html5/thumbnails/9.jpg)
9
Input Improvement Advantages and disadvantages
KNN Graph Democratic to each node Resulting classifier is a generalization of KNN May not be connected Long edges may exit while short edges are removed
SKNN-Graph Not democratic May not be connected Short edges are more important than long edges
Minimum Spanning Tree Not democratic Long edges may exit while short edges are removed Connection is guaranteed Less parameter Faster in training and testing
![Page 10: 1 Further Investigations on Heat Diffusion Models Haixuan Yang Supervisors: Prof Irwin King and Prof…](https://reader035.vdocuments.net/reader035/viewer/2022062601/5a4d1bf67f8b9ab0599e8cd1/html5/thumbnails/10.jpg)
10
Experiments Experimental Setup
Experimental Environments Hardware: Nix Dual Intel Xeon
2.2GHz OS: Linux Kernel 2.4.18-27smp
(RedHat 7.3) Developing tool: C
Data Description 3 artificial Data sets and 6
datasets from UCI Comparison
Algorithms: Parzen window
KNNSVM KNN-HSKNN-HMST-H
Results: average of the ten-fold cross validation
Dataset Cases Classes Variable
Syn-1 100 2 2
Syn-2 100 2 3
Syn-3 200 2 3
Breast-w 683 2 9
Glass 214 6 9
Iono 351 2 34
Iris 150 3 4
Sonar 208 2 60
Vehicle 846 4 18
![Page 11: 1 Further Investigations on Heat Diffusion Models Haixuan Yang Supervisors: Prof Irwin King and Prof…](https://reader035.vdocuments.net/reader035/viewer/2022062601/5a4d1bf67f8b9ab0599e8cd1/html5/thumbnails/11.jpg)
11
Experiments Results Dataset SVM KNN PWA KNN-H MST-H SKNN-H
Syn-1 66.0 67.0 80.0 93.0 95.0 95.0
Syn-2 34.0 67.0 83.0 94.0 94.0 89.0
Syn-3 54.0 79.5 92.0 91.0 90.0 92.0
Breast-w 96.8 94.1 96.6 96.9 95.9 99.4
Glass 68.1 61.2 63.5 68.1 68.7 70.5
Iono 93.7 83.2 89.2 96.3 96.3 96.3
Iris 96 97.3 95.3 98.0 92.0 94.7
Sonar 88.5 80.3 53.9 90.9 91.8 94.7
Vehicle 84.8 63.0 66.0 65.5 83.5 66.6
![Page 12: 1 Further Investigations on Heat Diffusion Models Haixuan Yang Supervisors: Prof Irwin King and Prof…](https://reader035.vdocuments.net/reader035/viewer/2022062601/5a4d1bf67f8b9ab0599e8cd1/html5/thumbnails/12.jpg)
12
Conclusions KNN-H, SKNN-H and MST-H
Candidates for the Heat Diffusion Classifier on a Graph.
![Page 13: 1 Further Investigations on Heat Diffusion Models Haixuan Yang Supervisors: Prof Irwin King and Prof…](https://reader035.vdocuments.net/reader035/viewer/2022062601/5a4d1bf67f8b9ab0599e8cd1/html5/thumbnails/13.jpg)
13
Application Improvement PageRank
Tries to find the importance of a Web page based on the link structure.
The importance of a page i is defined recursively in terms of pages which point to it:
Two problems: The incomplete information about the Web structure. The web pages manipulated by people for commercial
interests. About 70% of all pages in the .biz domain are spam About 35% of the pages in the .us domain belong to spam
category.
![Page 14: 1 Further Investigations on Heat Diffusion Models Haixuan Yang Supervisors: Prof Irwin King and Prof…](https://reader035.vdocuments.net/reader035/viewer/2022062601/5a4d1bf67f8b9ab0599e8cd1/html5/thumbnails/14.jpg)
14
Why PageRank is susceptible to web spam? Two reasons
Over-democratic All pages are born equal--equal voting ability of one
page: the sum of each column is equal to one. Input-independent
For any given non-zero initial input, the iteration will converge to the same stable distribution.
Heat Diffusion Model -- a natural way to avoid these two reasons of PageRank
Points are not equal as some points are born with high temperatures while others are born with low temperatures.
Different initial temperature distributions will give rise to different temperature distributions after a fixed time period.
![Page 15: 1 Further Investigations on Heat Diffusion Models Haixuan Yang Supervisors: Prof Irwin King and Prof…](https://reader035.vdocuments.net/reader035/viewer/2022062601/5a4d1bf67f8b9ab0599e8cd1/html5/thumbnails/15.jpg)
15
DiffusionRank On an undirected graph
Assumption: the amount of the heat flow from j to i is proportional to the heat difference between i and j.
Solution:
On a directed graph Assumption: there is extra energy imposed on
the link (j, i) such that the heat flow only from j to i if there is no link (i,j).
Solution:
On a random directed graph Assumption: the heat flow is proportional to the
probability of the link (j,i). Solution:
![Page 16: 1 Further Investigations on Heat Diffusion Models Haixuan Yang Supervisors: Prof Irwin King and Prof…](https://reader035.vdocuments.net/reader035/viewer/2022062601/5a4d1bf67f8b9ab0599e8cd1/html5/thumbnails/16.jpg)
16
DiffusionRank On a random directed graph
Solution:
The initial value f(i,0) in f(0) is set to be 1 if i is trusted and 0 otherwise according to the inverse PageRank.
![Page 17: 1 Further Investigations on Heat Diffusion Models Haixuan Yang Supervisors: Prof Irwin King and Prof…](https://reader035.vdocuments.net/reader035/viewer/2022062601/5a4d1bf67f8b9ab0599e8cd1/html5/thumbnails/17.jpg)
17
Computation consideration Approximation of heat kernel
N=? When N>=30, the real eigenvalues of are
less than 0.01; when N>=100, they are less than 0.005. We use N=100 in the paper.
When N tends to infinity
![Page 18: 1 Further Investigations on Heat Diffusion Models Haixuan Yang Supervisors: Prof Irwin King and Prof…](https://reader035.vdocuments.net/reader035/viewer/2022062601/5a4d1bf67f8b9ab0599e8cd1/html5/thumbnails/18.jpg)
18
Discuss γ γcan be understood as the thermal conductivity. When γ=0, the ranking value is most robust to
manipulation since no heat is diffused, but the Web structure is completely ignored;
When γ= ∞, DiffusionRank becomes PageRank, it can be manipulated easily.
Whenγ=1, DiffusionRank works well in practice
![Page 19: 1 Further Investigations on Heat Diffusion Models Haixuan Yang Supervisors: Prof Irwin King and Prof…](https://reader035.vdocuments.net/reader035/viewer/2022062601/5a4d1bf67f8b9ab0599e8cd1/html5/thumbnails/19.jpg)
19
DiffusionRank Advantages
Can detect Group-group relations Can cut Graphs Anti-manipulation
+1
-1
γ= 0.5 or 1
![Page 20: 1 Further Investigations on Heat Diffusion Models Haixuan Yang Supervisors: Prof Irwin King and Prof…](https://reader035.vdocuments.net/reader035/viewer/2022062601/5a4d1bf67f8b9ab0599e8cd1/html5/thumbnails/20.jpg)
20
DiffusionRank Experiments
Data: a toy graph (6 nodes) a middle-size real-world graph (18542 nodes) a large-size real-world graph crawled from CUHK
(607170 nodes) Compare with TrustRank and PageRank
![Page 21: 1 Further Investigations on Heat Diffusion Models Haixuan Yang Supervisors: Prof Irwin King and Prof…](https://reader035.vdocuments.net/reader035/viewer/2022062601/5a4d1bf67f8b9ab0599e8cd1/html5/thumbnails/21.jpg)
21
Results The tendency of
DiffusionRank when γ becomes larger
On the toy graph
![Page 22: 1 Further Investigations on Heat Diffusion Models Haixuan Yang Supervisors: Prof Irwin King and Prof…](https://reader035.vdocuments.net/reader035/viewer/2022062601/5a4d1bf67f8b9ab0599e8cd1/html5/thumbnails/22.jpg)
22
Anti-manipulation On the toy graph
![Page 23: 1 Further Investigations on Heat Diffusion Models Haixuan Yang Supervisors: Prof Irwin King and Prof…](https://reader035.vdocuments.net/reader035/viewer/2022062601/5a4d1bf67f8b9ab0599e8cd1/html5/thumbnails/23.jpg)
23
Anti-manipulation on the middle graph and the large graph
![Page 24: 1 Further Investigations on Heat Diffusion Models Haixuan Yang Supervisors: Prof Irwin King and Prof…](https://reader035.vdocuments.net/reader035/viewer/2022062601/5a4d1bf67f8b9ab0599e8cd1/html5/thumbnails/24.jpg)
24
Stability--the order difference between ranking results for an algorithm before it is manipulated and those after that
![Page 25: 1 Further Investigations on Heat Diffusion Models Haixuan Yang Supervisors: Prof Irwin King and Prof…](https://reader035.vdocuments.net/reader035/viewer/2022062601/5a4d1bf67f8b9ab0599e8cd1/html5/thumbnails/25.jpg)
25
Conclusions This anti-manipulation feature enables
DiffusionRank to be a candidate as a penicillin for Web spamming.
DiffusionRank is a generalization of PageRank (when γ=∞).
DiffusionRank can be employed to detect group-group relation.
DiffusionRank can be used to cut graph.
![Page 26: 1 Further Investigations on Heat Diffusion Models Haixuan Yang Supervisors: Prof Irwin King and Prof…](https://reader035.vdocuments.net/reader035/viewer/2022062601/5a4d1bf67f8b9ab0599e8cd1/html5/thumbnails/26.jpg)
26
Inside Improvement Motivations
Finite Difference Method is a possible way to solve the heat diffusion equation. the discretization of time the discretization of space and time
![Page 27: 1 Further Investigations on Heat Diffusion Models Haixuan Yang Supervisors: Prof Irwin King and Prof…](https://reader035.vdocuments.net/reader035/viewer/2022062601/5a4d1bf67f8b9ab0599e8cd1/html5/thumbnails/27.jpg)
27
Motivation Problems where we cannot employ FD
directly in the real data analysis: The graph constructed is irregular; The density of data varies; this also results in an
irregular graph; The manifold is unknown; The differential equation expression is unknown
even if the manifold is known.
![Page 28: 1 Further Investigations on Heat Diffusion Models Haixuan Yang Supervisors: Prof Irwin King and Prof…](https://reader035.vdocuments.net/reader035/viewer/2022062601/5a4d1bf67f8b9ab0599e8cd1/html5/thumbnails/28.jpg)
28
Intuition
![Page 29: 1 Further Investigations on Heat Diffusion Models Haixuan Yang Supervisors: Prof Irwin King and Prof…](https://reader035.vdocuments.net/reader035/viewer/2022062601/5a4d1bf67f8b9ab0599e8cd1/html5/thumbnails/29.jpg)
29
Volume-based Heat Diffusion Model Assumption
There is a small patch SP[j] of space containing node j; The volume of the small patch SP[j] is V (j), and the heat
diffusion ability of the small patch is proportional to its volume.
The temperature in the small patch SP[j] at time t is almost equal to f(j,t) because every unseen node in the small patch is near node j.
Solution
![Page 30: 1 Further Investigations on Heat Diffusion Models Haixuan Yang Supervisors: Prof Irwin King and Prof…](https://reader035.vdocuments.net/reader035/viewer/2022062601/5a4d1bf67f8b9ab0599e8cd1/html5/thumbnails/30.jpg)
30
Volume Computation Define V(i) to be the volume of the
hypercube whose side length is the average distance between node i and its neighbors.
a maximum likelihood estimation
![Page 31: 1 Further Investigations on Heat Diffusion Models Haixuan Yang Supervisors: Prof Irwin King and Prof…](https://reader035.vdocuments.net/reader035/viewer/2022062601/5a4d1bf67f8b9ab0599e8cd1/html5/thumbnails/31.jpg)
31
Experiments
K: KNN P: Parzen window U: UniverSvm L: LightSVMC: consistency method
VHD-v: by the best vVHD: v is found by the estimation HD: without volume considerationC1: 1st variation of CC2: 2nd variation of C
![Page 32: 1 Further Investigations on Heat Diffusion Models Haixuan Yang Supervisors: Prof Irwin King and Prof…](https://reader035.vdocuments.net/reader035/viewer/2022062601/5a4d1bf67f8b9ab0599e8cd1/html5/thumbnails/32.jpg)
32
Conclusions The proposed VHDM has the following
advantages: It can model the effect of unseen points by
introducing the volume of a node; It avoids the difficulty of finding the explicit
expression for the unknown geometry by approximating the manifold by a finite neighborhood graph;
It has a closed form solution that describes the heat diffusion on a manifold;
VHDC is a generalization of both the Parzen Window Approach (when the window function is a multivariate normal kernel) and KNN.
![Page 33: 1 Further Investigations on Heat Diffusion Models Haixuan Yang Supervisors: Prof Irwin King and Prof…](https://reader035.vdocuments.net/reader035/viewer/2022062601/5a4d1bf67f8b9ab0599e8cd1/html5/thumbnails/33.jpg)
33
Summary The input improvement of PHDC provide
us more choices for the input graphs. The outside improvement provides us a
possible penicillin for Web spamming, and a potentially useful tool for group-group discovery and graph cut.
The inside improvement shows us a promising classifier.