parallel muiticategory support vector machines (pmc-svm) for classifying microarray data 研究生 :...
TRANSCRIPT
![Page 1: Parallel muiticategory Support Vector Machines (PMC-SVM) for Classifying Microarray Data 研究生 : 許景復 單位 : 光電與通訊研究所](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649ef65503460f94c092d7/html5/thumbnails/1.jpg)
Parallel muiticategory Support Vector Machines (PMC-SVM)
for Classifying Microarray Data
研究生 研究生 : : 許景復許景復
單位 單位 : : 光電與通訊研究光電與通訊研究所所
![Page 2: Parallel muiticategory Support Vector Machines (PMC-SVM) for Classifying Microarray Data 研究生 : 許景復 單位 : 光電與通訊研究所](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649ef65503460f94c092d7/html5/thumbnails/2.jpg)
OutlineOutline
IntroductionIntroduction SMO-SVM SMO-SVM Parallel Muiticategory SVMParallel Muiticategory SVM Parallel Implementation and Environment Parallel Evaluation and Analysis Classifying Microarray DataClassifying Microarray Data ConclusionsConclusions
![Page 3: Parallel muiticategory Support Vector Machines (PMC-SVM) for Classifying Microarray Data 研究生 : 許景復 單位 : 光電與通訊研究所](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649ef65503460f94c092d7/html5/thumbnails/3.jpg)
IntroductionIntroduction
Biologists want to separate the data into multiple categories using a reliable cancer diagnostic model.
Based on a comprehensive evaluation of several muiticategory classification methods, it is found that support vector machines (SVM) are the most effective classifiers for performing accurate cancer diagnosis form gene expression.
In the paper, we developed new parallel muiticategory support vector machines (PMC-SVM) based on the sequential minimum optimization-type decomposition methods for support vector machines (SMO-SVM) of LibSVM term that needs less memory.
![Page 4: Parallel muiticategory Support Vector Machines (PMC-SVM) for Classifying Microarray Data 研究生 : 許景復 單位 : 光電與通訊研究所](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649ef65503460f94c092d7/html5/thumbnails/4.jpg)
SMO-SVM
}}1,1{,,,...,1),,{( iynR
ixNi
iy
ixD
}1,1{: nRF
TeQf 2
1)(min
The basic idea behind SVM is to separate two point classes of a training set,
by using a decision function optimization by solving a
convex quadratic programming optimization problem of the form
0
,,...,1,0
T
i
y
liCSubject to
(1)
![Page 5: Parallel muiticategory Support Vector Machines (PMC-SVM) for Classifying Microarray Data 研究生 : 許景復 單位 : 光電與通訊研究所](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649ef65503460f94c092d7/html5/thumbnails/5.jpg)
SMO-SVM
),,(, jijiji xxKyyQ ,,...,2,1, Nji
),( K
entries jiQ , are defined as
where denotes a kernel function, such as polynomial kernel
or Gaussian kernel.
whereT
NT
Nyyyy ],...,,[,],...,,[ 2121
is a constant.
Cand
e is a vector of all ones. Q is the symmetric positive
semidefinite matrix.
(3)
![Page 6: Parallel muiticategory Support Vector Machines (PMC-SVM) for Classifying Microarray Data 研究生 : 許景復 單位 : 光電與通訊研究所](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649ef65503460f94c092d7/html5/thumbnails/6.jpg)
SMO-SVM
The subset, denoted as B, is called working set.
If B is restricted to have only two elements, this special type of decomposition method is the Sequential Minimal Optimization (SMO).
![Page 7: Parallel muiticategory Support Vector Machines (PMC-SVM) for Classifying Microarray Data 研究生 : 許景復 單位 : 光電與通訊研究所](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649ef65503460f94c092d7/html5/thumbnails/7.jpg)
Step2:k
kB
k
If Is a stationary point of (2), stop. Otherwise, find
a two-element working set }.,...,1{},{ ljiB Define BlN \},...,1{ , and k
B and
as subvector of corresponding to B Nand ,respectively.
There are four steps to implement SMO:
1Find as the initial feasible solution. Set
1kStep1:
![Page 8: Parallel muiticategory Support Vector Machines (PMC-SVM) for Classifying Microarray Data 研究生 : 許景復 單位 : 光電與通訊研究所](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649ef65503460f94c092d7/html5/thumbnails/8.jpg)
Step3: If 02, ijjjiiji KKKa
B
j
iTkNBNB
j
i
jjij
ijiiji Qp
)(2
1min
Solve the following sub-problem with the variable
:
subject to
,
,,0kN
TNjjii
ji
yyy
C
))()((4
)(2
1min
22 kjj
kii
ij
j
iTkNBNB
jjij
ijiiji Qp
else
solve
subject to constraints of (4)
Step4:1k
BkN
kN 1 1 kk
Set to be the optimal solution of (4) and
and go to step 2.. Set
(4)
(5)
![Page 9: Parallel muiticategory Support Vector Machines (PMC-SVM) for Classifying Microarray Data 研究生 : 許景復 單位 : 光電與通訊研究所](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649ef65503460f94c092d7/html5/thumbnails/9.jpg)
Parallel Muiticategory SVM(PMC-SVM)
In muiticategory classification of support vector machines, the algorithm will generate sub models for categories.
Generating models is the most time consuming task in this algorithm so it is desirable to distribute all the sub models onto multiple processors and each processor perform a subtask to improve the performance.
2/)1( kkk
![Page 10: Parallel muiticategory Support Vector Machines (PMC-SVM) for Classifying Microarray Data 研究生 : 許景復 單位 : 光電與通訊研究所](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649ef65503460f94c092d7/html5/thumbnails/10.jpg)
Example:
We have 4 processors and k=16, that means we have to generate k(k-1)/2 models,
which are total 120 models.
,1,...,0
),()1(,
Np
piNkT ip
Nk
where
is the total number of the
processors and the number of
categories.
![Page 11: Parallel muiticategory Support Vector Machines (PMC-SVM) for Classifying Microarray Data 研究生 : 許景復 單位 : 光電與通訊研究所](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649ef65503460f94c092d7/html5/thumbnails/11.jpg)
Parallel Implementation and Environment
One is the sharedmemory SGI Origin 2800 Supercomputers(sweetgum) equipped with 128 CPUs, 64 gigabytes of memory, and 1.6 Terabytes of fiberchannel disk.
The other is a distributed memory Linux cluster (mimosa) with 192 nodes.
![Page 12: Parallel muiticategory Support Vector Machines (PMC-SVM) for Classifying Microarray Data 研究生 : 許景復 單位 : 光電與通訊研究所](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649ef65503460f94c092d7/html5/thumbnails/12.jpg)
Parallel Evaluation and Analysis
PMC-SVM is tested on both sweetgum and mimosa platforms using the above two datasets.
Dataset 1: Letter_scale
classes: 26
trainig size: 16,000
features: 16
Dataset 2: Mnist_scale
classes: 10
training size: 21,000
features: 780
![Page 13: Parallel muiticategory Support Vector Machines (PMC-SVM) for Classifying Microarray Data 研究生 : 許景復 單位 : 光電與通訊研究所](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649ef65503460f94c092d7/html5/thumbnails/13.jpg)
![Page 14: Parallel muiticategory Support Vector Machines (PMC-SVM) for Classifying Microarray Data 研究生 : 許景復 單位 : 光電與通訊研究所](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649ef65503460f94c092d7/html5/thumbnails/14.jpg)
![Page 15: Parallel muiticategory Support Vector Machines (PMC-SVM) for Classifying Microarray Data 研究生 : 許景復 單位 : 光電與通訊研究所](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649ef65503460f94c092d7/html5/thumbnails/15.jpg)
![Page 16: Parallel muiticategory Support Vector Machines (PMC-SVM) for Classifying Microarray Data 研究生 : 許景復 單位 : 光電與通訊研究所](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649ef65503460f94c092d7/html5/thumbnails/16.jpg)
Figure 2. The speedup of PMC-SVM on sweetgum with Dataset 1 (Letter_scale )
Figure 3. The speedup of PMC-SVM on mimosa with Datasets 1 (Leetter_scale)
![Page 17: Parallel muiticategory Support Vector Machines (PMC-SVM) for Classifying Microarray Data 研究生 : 許景復 單位 : 光電與通訊研究所](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649ef65503460f94c092d7/html5/thumbnails/17.jpg)
Figure 4. The speedup of PMC-SVM on swetgum with Datasets 2 (Mnist_problem)
Figure 4. The speedup of PMC-SVM on mimosa with Datasets 2 (Mnist_problem)
![Page 18: Parallel muiticategory Support Vector Machines (PMC-SVM) for Classifying Microarray Data 研究生 : 許景復 單位 : 光電與通訊研究所](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649ef65503460f94c092d7/html5/thumbnails/18.jpg)
Classifying Microarray DataClassifying Microarray Data
Dataset 3: 14_Tumors(40Mb)
Human tumor types: 14
normal tissue types: 12
Dataset 4: 11_Tumors(18Mb)
Human tumor types: 11
In the work, two microarray datasets were to demonstrate the
performance of PMC-SVM, as listed below:
![Page 19: Parallel muiticategory Support Vector Machines (PMC-SVM) for Classifying Microarray Data 研究生 : 許景復 單位 : 光電與通訊研究所](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649ef65503460f94c092d7/html5/thumbnails/19.jpg)
#of PEs Time (s) Speedup
1 774.2 -
2 434.7 1.78
4 240.1 3.22
8 150.7 5.14
16 90.5 8.55
24 74.1 10.45
#of PEs Time (s) Speedup
1 257.7 -
2 140.9 1.82
4 82.2 3.13
8 57.2 4.50
16 39.9 6.62
Table 6: Performance on sweetgum (Dataset 3)
Table 7: Performance on sweetgum (Dataset 4)
![Page 20: Parallel muiticategory Support Vector Machines (PMC-SVM) for Classifying Microarray Data 研究生 : 許景復 單位 : 光電與通訊研究所](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649ef65503460f94c092d7/html5/thumbnails/20.jpg)
![Page 21: Parallel muiticategory Support Vector Machines (PMC-SVM) for Classifying Microarray Data 研究生 : 許景復 單位 : 光電與通訊研究所](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649ef65503460f94c092d7/html5/thumbnails/21.jpg)
ConclusionsConclusions
PMC-SVM has been developed for classifying large datasets based on SMO-type decomposition method.
The experimental results show that the high performance computing techniques and parallel implementation can achieve a significant speedup.
![Page 22: Parallel muiticategory Support Vector Machines (PMC-SVM) for Classifying Microarray Data 研究生 : 許景復 單位 : 光電與通訊研究所](https://reader036.vdocuments.net/reader036/viewer/2022062304/56649ef65503460f94c092d7/html5/thumbnails/22.jpg)
Thanks for your attendance!Thanks for your attendance!