hc-4012, complex network clustering using gpu-based parallel non-negative matrix factorization, by...
DESCRIPTION
Presentation HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Matrix Factorization, by Huming Zhu at the AMD Developer Summit (APU13) November 11-13, 2013.TRANSCRIPT
Huming Zhu, Maoguo Gong, Baolin Huang [email protected] 2013.11
Complex Network Clustering Using GPU-based Parallel Non-negative Matrix Factorization
Xidian university
openCL COURSE! ID:0222277,0242277 ! Opencl PROGRAMMING,Practice! 2011、2012,2013
2 Parallel Bayesian NMF on GPU
Contents
4 Experiment
5 Conclusion
Complex Network Clustering of NMF 1
3 Sparse BNMF on GPU
5 Xidian University 12/7/13 5
* All pictures are from Internet
Complex Network Clustering
6 Xidian University 12/7/13 6
Complex Network Clustering
Network clustering aims to divide a network into several communities. It is required
that the number of edges linking nodes of the same communities should be higher
than the number of edges joining nodes belonging to different communities.
• Network clustering is essential for understanding how a network is organized and functions.
7 Xidian University 12/7/13 7
Non-negative Matrix Factorization (NMF)
" powerful interpretability and close relationship between clustering methods.
" Need a lot of computation power.
" The NMF problem is defined as a searching for an approximation of the matrix
A with respect to some metric (e.g., the norm) by factoring A into the product
W × H of two reduced matrices W and H.
" NMF was applied in many areas, image processing,
[1] D. D. Lee, H. S. Seung: Learning the parts of objects by non-negative matrix factorization. Nature 401,pp. 788–791 (1999).
8 Xidian University 12/7/13 8
Bayesian NMF
Input : Nonnegative data (observation) matrix A, fixed hyperparameters a, b. Output : Nonnegative matrices W and H Step1 :Initialize W and H to nonnegative values
Step5. If convergence then stop, otherwise, go to step2.
9 Xidian University 12/7/13
2 Parallel Bayesian NMF on GPU
Contents
4 Experiment
5 Conclusion
Complex Network Clustering of NMF 1
3 Sparse BNMF on GPU
10 Xidian University 12/7/13
Parallel Bayesian NMF
• P-BNMF • Sparse-BNMF。
11 Xidian University 12/7/13
P-BNMF kernel
matrix multiplication
Matrix square sum
12 Xidian University 12/7/13
" Update matrix:W*H " Kernel: mat_mult_AB
Matrix multiplication
13 Xidian University 12/7/13
sum of square of Matrix
14 Xidian University 12/7/13
2 Parallel Bayesian NMF on GPU
Contents
4 Experiment
5 Conclusion
Complex Network Clustering of NMF 1
3 Sparse BNMF on GPU
15 Xidian University 12/7/13
Sparse-BNMF
Problem
GPU memory 1G,P-BNMF scale limit!
Sparse matrix storage format (CSR) ,Present Sparse-BNMF。
Solution
16 Xidian University 12/7/13
Sparse-BNMF
CSR :Aj, Av, Ap
CSR column :Aj_column, Av_column, Ap_column
17 Xidian University 12/7/13
18 Xidian University 12/7/13
Pseudo-code for A_WH_csr kernel luint row = globalidy; if(row < row_num) {
uint rowStart = Ap[row]; //get the start start position in Aj of this row.
uint rowEnd = Ap[row+1]; //get the end position of this row. int index = rowStart + groupidx * 16 + localid; //the size of group is 16*1
//get the position of this pe(processing elelmet). int col = Aj[index];//get the position in Av of this pe.
int aStart = widthA *groupidy; int aEnd = aStart + widthA -1; int aStep = 16; float Csub = 0.+0.000001; int bStart = col; int bStep = 16*widthB; for(int a = aStart, b = bStart; a < aEnd; a += aStep, b += bStep) { if(rowStart + groupidx * 16 < rowEnd) {//if there exist any nonzero value in this group As[localid]=W[a + localid]; barrier(CLK_LOCAL_MEM_FENCE); } if(rowStart + groupidx * 16+ localid < rowEnd) {// if this pe correspond to a nonzero value for(int k=0; k<16; k++) Bs[k*16+localid]= H[b + k*widthB]; for(int k=0; k<16; k++) Csub += Bs[k*16+localid]*As[k]; } if(rowStart + groupidx * 16+ localid < rowEnd) Av_result[index] =1.0/Csub; }
}
19 Xidian University 12/7/13
20 Xidian University 12/7/13
2 Parallel Bayesian NMF on GPU
4 Experiment
5 Conclusion
Complex Network Clustering of NMF 1
3 Sparse BNMF on GPU
Contents
21 Xidian University 12/7/13 21
Machine
" AMD Accelerated Parallel Processing (APP) SDK v2.7, OpenCL 1.2 " Microsoft Visual Studio 2010;
Host Device
Product Name HP xw9400 workstation Product Name AMD Radeon HD 7770
OS Windows 7 .x64 Edition Engine Speed 1000MHz
CPU 4× Dual-Core AMD Opteron 2220 2.80GHz Processing Elements 640
Memory 32GB Memory 1GB GDDR5
Memory Bandwidths 72GB/s
PCI PCI Express® 3.0 x16
22 Xidian University 12/7/13 22
synthetic real-world networks Data Vertex Edges Q Data Vertex Edges Q
Benchmark 128 1024 0.450 Facebook 324 4436 0.620
LFR
500 5135 0.813 Email 1133 5451 0.531
1000 9582 0.904 Netscience 1461 2742 0.905
5000 38007 0.908 Power 4941 6594 0.599
10000 148470 0.860 Scientists 6650 59870 0.647
50000 748337 0.900 Hep 7610 15751 0.772
Evaluation Modularity(Q)[1]
1 ( ) ( , )2 2
i jij i j
ij
k kQ A C C
m mδ= −∑
[1]. M. E. J. Newman, M. Girvan, Finding and evaluating community structure in networks, Phys. Rev. E 69 (2) (2004) 026113.
Q↑,Better Network structure
23 Xidian University 12/7/13 23
Network demo
Netscience (part)
Facebook• The netscience network is a network of co-authorship of scientists working on network theory and experiment.
24 Xidian University 12/7/13 24
Speedup
Data Vertex K BNMF(s) P-BNMF(s) Sparse-BNMF(s) P-Ratio Sparse-Ratio
Benchmark 128 64 4.165 0.166 0.226 4.37 3.1
LFR
500 128 109.9 0.823 1.096 67.63 51.35 1000 128 712.5 2.98 2.798 187.58 181.6 5000 128 31031.5 109.96 71.167 279.39 417.21
10000 128 186321.7 615.09 334.23 302.92 556.2
50000 128 * * 8250.28 * *
Facebook 324 128 46.25 1.328 1.656 34.82 27.93
Email 1133 128 774.4 3.901 3.042 162.24 189.33
Netscience 1461 128 1253.2 6.725 4.628 166.11 215.81
Power 4941 128 26202.4 108.30 61.787 239.29 404.38 Hep 7610 128 76827.2 271.28 152.66 281.75 491.85
Scientists 6650 128 63254.5 208.2 125.55 303.81 503.84
K is the number of clustering,BNMF(s) serial time,P-Rati: P-BNMF/BNMF speedup Sparse-Ratio:Sparse-BNMF/BNMF speedup。
25 Xidian University 12/7/13 25
Speedup
" Netscience " Cluster number K 64~256. " Speedup,Sparse-BNMF better。
26 Xidian University 12/7/13 26
" Using CodeXL to analyze OpenCL kernels on AMD GPUs
27 Xidian University 12/7/13 27
Method GlobalWorkSize WorkGroupSize Time
Update_H {1472 128 1} {16 16 1} 6.12726 mat_mult_AB {1472 1472 1} {16 16 1} 10.73615 mat_dot_div {1472 1472 1} {16 16 1} 3.70267
mat_mult_AtB {1472 128 1} {16 16 1} 9.72355 mat_dot_mult {1472 128 1} {16 16 1} 0.30133
mat_squ_sum_row {1472 128 1} {64 1 1} 0.5483 mat_squ_sum_col { 128 1472 1} { 1 64 1} 7.27985
update_invbeta { 128 1 1} { 4 1 1} 0.03763 Update_W { 128 1472 1} {16 16 1} 6.25437
mat_mult_AB {1472 1472 1} {16 16 1} 10.75037
mat_dot_div {1472 1472 1} {16 16 1} 3.64148 mat_mult_ABt { 128 1472 1} {16 16 1} 9.04222 mat_dot_mult { 128 1472 1} {16 16 1} 0.2843
Method GlobalWorkSize WorkGroupSize Time
Update_H {1472 128 1} {16 16 1} 6.11407 A_WH_csr_col {1472 1472 1} { 1 16 1} 7.76119
mat_mult_A_s_col {1461 2048 1} { 1 16 1} 5.36341 mat_dot_mult {1472 128 1} {16 16 1} 0.2917
mat_squ_sum_row {1472 128 1} {64 1 1} 0.55304
mat_squ_sum_col { 128 1472 1} { 1 64 1} 6.99467 update_invbeta {128 1 1} { 4 1 1} 0.03748
Update_W { 128 1472 1} {16 16 1} 6.17718 A_WH_csr {1472 1472 1} {16 1 1} 6.29185
mat_mult_s_Bt {2048 1461 1} {16 1 1} 5.37615
mat_dot_mult { 128 1472 1} {16 16 1} 0.27763
Table1. P-BNMF kernel Table 2.Sparse-BNMF kernel的
" Table 1, bolt kernel,W* H,dot matriply,AtB。 " Table 2, Sparse kernel, A_WH_csr_co和mat_mult_A_s_col。 " CSR is better。
Kernel information provided by CodeXL
28 Xidian University 12/7/13 28
PNMF Sparse-BNMF
SIZE small(<10000) big
speedup low high
PNMF VS Sparse-BNMF
# the Sparse-BNMF algorithm can solve the memory limit problem effectively,
# which enables the algorithm to deal with larger scale networks.
29 Xidian University 12/7/13
2 Parallel Bayesian NMF on GPU
4 Experiment
5 Conclusion
Complex Network Clustering of NMF 1
3 Sparse BNMF on GPU
Contents
30 Xidian University 12/7/13 30
Our work
" Present P-BNMF and Sparse-NMF;
" P-BNMF;
" Sparse-BNMF, CSR;
" speedup.
Future
" Portablity。
31
Thank You!