high performance data mining on multi-core systems
DESCRIPTION
S A L S A Team Geoffrey Fox Xiaohong Qiu Huapeng Yuan Seung-Hee Bae Indiana University Technology Collaboration George Chrysanthakopoulos Henrik Frystyk Nielsen Microsoft Application Collaboration Cheminformatics Rajarshi Guha David Wild Bioinformatics - PowerPoint PPT PresentationTRANSCRIPT
HIGH PERFORMANCE DATA MINING ON MULTI-CORE SYSTEMS
Service Aggregated Linked Sequential Activities:
GOALS: Increasing number of cores accompanied by continued data delugeDevelop scalable parallel data mining algorithms with good multicore and cluster performance; understand software runtime and parallelization method. Use managed code (C#) and package algorithms as services to encourage broad use assuming experts parallelize core algorithms.
CURRENT RESUTS: Microsoft CCR supports MPI, dynamic threading and via DSS Service model of computing; detailed performance measurementsSpeedups of 7.5 or above on 8-core systems for “large problems” with deterministic annealed (avoid local minima) algorithms for clustering, Gaussian Mixtures, GTM (dimensional reduction); extending to new algorithms/applications
SALSA Team Geoffrey Fox Xiaohong Qiu Huapeng Yuan Seung-Hee Bae Indiana University
Technology Collaboration George Chrysanthakopoulos Henrik Frystyk NielsenMicrosoft
Application CollaborationCheminformatics Rajarshi Guha David WildBioinformatics Haiku TangDemographics (GIS) Neil DevadasanIndianan University and IUPUI
SALSA
Speedup = Number of cores/(1+f)f = (Sum of Overheads)/(Computation per core)
Computation Grain Size n . # Clusters KOverheads are
Synchronization: small with CCRLoad Balance: goodMemory Bandwidth Limit: 0 as K Cache Use/Interference: ImportantRuntime Fluctuations: Dominant large n,K
All our “real” problems have f ≤ 0.05 and speedups on 8 core systems greater than 7.6
MPI Exchange Latency in µs (20-30 µs computation between messaging)
Machine OS Runtime Grains Parallelism MPI Latency
Intel8c:gf12(8 core 2.33 Ghz)(in 2 chips)
Redhat MPJE(Java) Process 8 181
MPICH2 (C) Process 8 40.0
MPICH2:Fast Process 8 39.3
Nemesis Process 8 4.21
Intel8c:gf20(8 core 2.33 Ghz)
Fedora MPJE Process 8 157
mpiJava Process 8 111
MPICH2 Process 8 64.2
Intel8b(8 core 2.66 Ghz)
Vista MPJE Process 8 170
Fedora MPJE Process 8 142
Fedora mpiJava Process 8 100
Vista CCR (C#) Thread 8 20.2
AMD4(4 core 2.19 Ghz)
XP MPJE Process 4 185
Redhat MPJE Process 4 152
mpiJava Process 4 99.4
MPICH2 Process 4 39.3
XP CCR Thread 4 16.3
Intel(4 core) XP CCR Thread 4 25.80
0.1
0.2
0.3
0.4
0 0.5 1 1.5 2 2.5 3 3.5 4
FractionalOverheadf
K=10 Clusters
20 Clusters
10000/Grain Size
30 Clusters
DA Clustering Performance
Runtime Fluctuations 2% to 5% overhead
“Main Thread” and Memory M
1m1
0m0
2m2
3m3
4m4
5m5
6m6
7m7
Subsidiary threads t with memory mt
Use Data Decomposition as in classic distributed memory but use shared memory for read variables. Each thread uses a “local” array for written variables to get good cache performance
Parallel Programming Strategy
SALSA
Resolution T0.5
r: Rentersa:Asian
h: Hispanic
p: Total
Resolution T0.5
Deterministic Annealing Clustering of Indiana Census DataDecrease temperature (distance scale) to discover more clusters
GTM Projection of 2 clusters of 335 compounds in 155 dimensions
Stop Press: GTM Projection of PubChem: 10,926,94 compounds in 166 dimension binary property space takes 4 days on 8 cores. 64X64 mesh of GTM clusters interpolates PubChem. Could usefully use 1024 cores! David Wild will use for GIS style 2D browsing interface to chemistryBioinformatics: Annealed Clustering and Euclidean embedding for repetitive sequences, gene/protein families. Use GTM to replace PCA in structure analysis
PCA GTM
Linear PCA v. nonlinear GTM on 6 Gaussians in 3DSALSA
21
1
( ) ln{ ( ) exp[ 0.5( ( ) ( )) / ( ( ))]N
K
kx
F T a x g k E x Y k Ts k
GENERAL FORMULA DAC GM GTM DAGTM DAGM
SALSA
N data points E(x) in D dim. space and Minimize F by EM
• Link of CCR and MPI
(or cross cluster CCR)• Linear Algebra for
C#: (Multiplication, SVD, Equation
Solve) • High Performance
C# Math Libraries
Deterministic Annealing Clustering (DAC)
• a(x) = 1/N or generally p(x) with p(x) =1
• g(k)=1 and s(k)=0.5• T is annealing temperature varied
down from with final value of 1• Vary cluster center Y(k) but can
calculate Pk and (k) (even for matrix (k)) using IDENTICAL formulae for Gaussian mixtures
• K starts at 1 and is incremented by algorithm
Generative Topographic Mapping (GTM)
• a(x) = 1 and g(k) = (1/K)( /2)D/2
• s(k) = 1/ and T = 1• Y(k) = m=1
M Wmm(X(k)) • Choose fixed m(X) = exp( - 0.5 (X-
m)2/2 ) • Vary Wm and but fix values of M
and K a priori• Y(k) E(x) Wm are vectors in original
high D dimension space• X(k) and m are vectors in 2 dim.
mapped space
We need: Large Windows Cluster
Deterministic Annealing Gaussian mixture models
(DAGM)• a(x) = 1• g(k)={Pk/(2(k)2)D/2}1/T
• s(k)= (k)2 (taking case of spherical Gaussian)
• T is annealing temperature varied down from with final value of 1
• Vary Y(k) Pk and (k) • K starts at 1 and is incremented
by algorithm DAGTM: GTM has several natural annealing
versions based on either DAC or DAGM:
under investigationTraditional
Gaussian mixture models GM
As DAGM but set T=1 and fix K
Principal Component Analysis (PCA)
Near Term Future Work: Parallel Algorithms for
• Random Projection Metric Embedding (Bourgain)
• MDS Dimensional Scaling (EM like SMACOF)
• Marquardt Algorithm for Newton’s Method
Later: HMM and SVM, Other embedding
Parallel Dimensional Scaling and Metric
embedding; Generalized Cluster analysis