gene clustering
DESCRIPTION
dkTRANSCRIPT
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 176
Gene clustering
Gene sorting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 276
Outline
Clustering
K-means
Annealed K-means
Kohonenrsquos Self -organizing algorithm
Yeast gene analysis
Cancer gene analysis
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 376
High dimensional data clustering
K-means
Annealed K-Means
Euclidean distance measure
Mahalanobis distance measure
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 476
Data clustering
Find cluster centers
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 576
Matlab stats toolbox
Add directory toolboxstats to matlab path
Load data_9mat
Kmeans clustering
[cidx ctrs] = kmeans(XX10)
plot(XX(1)XX(2))
hold onplot(ctrs(1)ctrs(2)ro)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 676
Matlab stats toolbox
[cidx ctrs] = kmeans(XX10)
Ten clusters
Data matrix Nxd
Center locations Kxd
Membership Nx1
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 776
Download
Internet
Example Add directory cluster to matlab path
Load data_9mat
Kmeans clustering [cidx ctrs] = kmeans(XX10)
plot(XX(1)XX(2))hold onplot(ctrs(1)ctrs(2)ro)
My_Kmeans Parallel codes
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 876
[cidx ctrs] = kmeans(XX10)
Ten clusters
Data matrix dxN
Center locations dxK
Membership 1xN
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 976
K-means
An iterative approach
Non-overlapping partition
Means of partitioned groups
Non-overlapping partition Each data x
is partitioned to its nearest center
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1076
Stepwise procedure
1 Input unsupervised high dimensional data
and the number K of clusters
2 Randomly initialize K centers near the center
of given data3 Partition all data to K clusters using current
means
4 Update the center position of each group5 If halting condition holds exit
6 Go to step 3
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1176
Exercise
Draw a while-loop flow chart to illustrate
how the K-means algorithm partitions
high-dimensional data to K clusters
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1276
Exercise
Apply the K-means method to clustering
analysis of data in data_25mat
Calculate the mean distance between
each data point and its correspondent
center
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1376
Annealed K-means
Use overlapping memberships
Overcome the local minimum problem
Relaxation under an annealing process
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1476
Overlapping memberships
Each x has an overlapping membership
to each cluster
The membership to the ith cluster is
proportional to exp(-βdi) where di denote
the distance between x and the ith center
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1576
Annealed K-means
1 Input unsupervised high dimensional dataand the number K of clusters
2 Set β to sufficiently small positive numberrandomly initialize K centers near the center
of given data3 Determine overlapping memberships of all
data to K clusters
4 Update the center position of each group
5 Increase β carefully 6 If halting condition holds exit
7 Go to step 3
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1676
Exercise
Draw a while-loop flow chart to illustrate
how the annealed K-means algorithm
partition high-dimensional data to K
clusters
Implement the flow chart using Matlab
codes
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1776
Set β sufficiently small
Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1876
Minimize the weight
distance
t
i
t
i
i
t
i
i
i
t
iii
t v
t t v
t t v
dy
dE
t t v
][
][][
0)][(][
0
][][E
i
2
x
y
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1976
k
k
k
k
k
k
k k
k k
d C
d C
v
d C vd v
)exp(1
1)exp(
1
)exp()exp(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2076
j
j
k
k d
d v
)exp(
)exp()(
d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2176
Overlapping membership
Sufficiently small β
vk equals 1K
Sufficiently large β
vk = 1 if dk = mini di
= 0 otherwise
All vk are determined by the winner-take-all
principle
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2276
Why annealing
Sufficiently large β leads to the K-means
algorithm
Can annealing improve the performance
How to measure the performance for
clustering
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2376
center nearestthe][eachof distancetheupsums
10][][
][][
][][
EE
][][E
i
2
2
2
tot E
t vt if
t t v
t t v
t t v
i
t i
ii
i t
ii
i
i
t
iii
x
yx
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2476
Criterion of K-means
Minimal distance to
the nearest center
t i
ii t t 2][][E(y) yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2576
Cross distance
Refer to Math Programming
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2676
Self-organization
Clustering
Define neighboring relations of clusters
on a 2D lattice
Sorting high-dimensional data on a
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2776
Clustering by SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2876
SOM
Download Matlab Neural Networks
toolbox
Add directory nnet with sub-directories
to matlab path
Apply the SOM method to clustering
analysis of data in data_25mat
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2976
Load data
load data_25
plot(XX(1)XX(2)g)
hold on
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3076
SOM
net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100
Initialization
net = train(netXX)
plotsom(netiw11netlayers1distances)
Training
Display
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3176
net = newsom([0 02 0 01][5 5]gridtop)
5x5 lattice
2x2 matrix for two-dimensional inputs
dx2 matrix for d-dimensional inputs
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3276
net = train(netXX)
dxN data matrix
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3376
Surface construction
demo_fr2_somm
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3476
Surface reconstruction
P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]
net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)
hold onplotsom(netiw11netlayers1distances)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3576
Self-organization
Maximal fitting and minimal wiring
principle
Sort high dimensional data on a 2D
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3676
Neural optimization
K-means
Minimal distance to the nearest center
Kohonenrsquos self -organizing
Minimal wiring distance between
neighboring centers
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3776
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
C1 i=1j=1
C2 i=1j=M
C3 i=Mj=1
C4 i=Mj=M
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3876
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
S1 i=1j=2M-1
S2 i=2M-1j=M
S3 i=Mj=2M-1
S4 i=2M-1j=1
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3976
If (ij) not within S1-S4 and C1-C4
NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)
NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4076
Wiring Criterion
j M i jih
y yk k NB ji k jih
)1()(
W2
)()( )(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4176
Kohonenrsquos self -organizing algorithm
Self-organizing map - Wikipedia the free
encyclopedia
Neural networks (1996)
Neural networks (2002)
Neurocomputing (2011)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4276
SOM matlab toolbox
SOM Toolbox
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4376
Neural Networks 15(3) 337-347 (2002)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4476
Neurocomputing 74(12-13) 2228-2240
(2011)
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4576
Minimal wiring and maximal fitting
criteria
Maximal fitting to a center Minimal distance to the nearest center
Wiring Criterion
t i
ii t t 2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W
2
)()(
)(
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4676
Minimal wiring and maximal fitting
criteria
t iii t t
2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W2
)()(
)(
WEF(y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4776
Set β sufficiently small Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4876
Minimize the weight distance
k
k NBh
k hk
t
k
i
k k
k NBh
k h
t
ik k
ySolve
y yt t v
dy
W E d
y yt t v
0)()][(][
0)(
W][][E
)(
)(
2
k
2
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4976
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5076
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5176
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5276
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5376
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5476
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5576
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5676
Microarray expressions of yeast genes
Pat_Brown_Lab_Home_Page
Exploring the Metabolic and Genetic Control of Gene
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 276
Outline
Clustering
K-means
Annealed K-means
Kohonenrsquos Self -organizing algorithm
Yeast gene analysis
Cancer gene analysis
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 376
High dimensional data clustering
K-means
Annealed K-Means
Euclidean distance measure
Mahalanobis distance measure
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 476
Data clustering
Find cluster centers
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 576
Matlab stats toolbox
Add directory toolboxstats to matlab path
Load data_9mat
Kmeans clustering
[cidx ctrs] = kmeans(XX10)
plot(XX(1)XX(2))
hold onplot(ctrs(1)ctrs(2)ro)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 676
Matlab stats toolbox
[cidx ctrs] = kmeans(XX10)
Ten clusters
Data matrix Nxd
Center locations Kxd
Membership Nx1
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 776
Download
Internet
Example Add directory cluster to matlab path
Load data_9mat
Kmeans clustering [cidx ctrs] = kmeans(XX10)
plot(XX(1)XX(2))hold onplot(ctrs(1)ctrs(2)ro)
My_Kmeans Parallel codes
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 876
[cidx ctrs] = kmeans(XX10)
Ten clusters
Data matrix dxN
Center locations dxK
Membership 1xN
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 976
K-means
An iterative approach
Non-overlapping partition
Means of partitioned groups
Non-overlapping partition Each data x
is partitioned to its nearest center
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1076
Stepwise procedure
1 Input unsupervised high dimensional data
and the number K of clusters
2 Randomly initialize K centers near the center
of given data3 Partition all data to K clusters using current
means
4 Update the center position of each group5 If halting condition holds exit
6 Go to step 3
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1176
Exercise
Draw a while-loop flow chart to illustrate
how the K-means algorithm partitions
high-dimensional data to K clusters
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1276
Exercise
Apply the K-means method to clustering
analysis of data in data_25mat
Calculate the mean distance between
each data point and its correspondent
center
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1376
Annealed K-means
Use overlapping memberships
Overcome the local minimum problem
Relaxation under an annealing process
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1476
Overlapping memberships
Each x has an overlapping membership
to each cluster
The membership to the ith cluster is
proportional to exp(-βdi) where di denote
the distance between x and the ith center
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1576
Annealed K-means
1 Input unsupervised high dimensional dataand the number K of clusters
2 Set β to sufficiently small positive numberrandomly initialize K centers near the center
of given data3 Determine overlapping memberships of all
data to K clusters
4 Update the center position of each group
5 Increase β carefully 6 If halting condition holds exit
7 Go to step 3
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1676
Exercise
Draw a while-loop flow chart to illustrate
how the annealed K-means algorithm
partition high-dimensional data to K
clusters
Implement the flow chart using Matlab
codes
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1776
Set β sufficiently small
Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1876
Minimize the weight
distance
t
i
t
i
i
t
i
i
i
t
iii
t v
t t v
t t v
dy
dE
t t v
][
][][
0)][(][
0
][][E
i
2
x
y
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1976
k
k
k
k
k
k
k k
k k
d C
d C
v
d C vd v
)exp(1
1)exp(
1
)exp()exp(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2076
j
j
k
k d
d v
)exp(
)exp()(
d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2176
Overlapping membership
Sufficiently small β
vk equals 1K
Sufficiently large β
vk = 1 if dk = mini di
= 0 otherwise
All vk are determined by the winner-take-all
principle
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2276
Why annealing
Sufficiently large β leads to the K-means
algorithm
Can annealing improve the performance
How to measure the performance for
clustering
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2376
center nearestthe][eachof distancetheupsums
10][][
][][
][][
EE
][][E
i
2
2
2
tot E
t vt if
t t v
t t v
t t v
i
t i
ii
i t
ii
i
i
t
iii
x
yx
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2476
Criterion of K-means
Minimal distance to
the nearest center
t i
ii t t 2][][E(y) yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2576
Cross distance
Refer to Math Programming
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2676
Self-organization
Clustering
Define neighboring relations of clusters
on a 2D lattice
Sorting high-dimensional data on a
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2776
Clustering by SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2876
SOM
Download Matlab Neural Networks
toolbox
Add directory nnet with sub-directories
to matlab path
Apply the SOM method to clustering
analysis of data in data_25mat
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2976
Load data
load data_25
plot(XX(1)XX(2)g)
hold on
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3076
SOM
net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100
Initialization
net = train(netXX)
plotsom(netiw11netlayers1distances)
Training
Display
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3176
net = newsom([0 02 0 01][5 5]gridtop)
5x5 lattice
2x2 matrix for two-dimensional inputs
dx2 matrix for d-dimensional inputs
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3276
net = train(netXX)
dxN data matrix
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3376
Surface construction
demo_fr2_somm
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3476
Surface reconstruction
P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]
net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)
hold onplotsom(netiw11netlayers1distances)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3576
Self-organization
Maximal fitting and minimal wiring
principle
Sort high dimensional data on a 2D
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3676
Neural optimization
K-means
Minimal distance to the nearest center
Kohonenrsquos self -organizing
Minimal wiring distance between
neighboring centers
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3776
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
C1 i=1j=1
C2 i=1j=M
C3 i=Mj=1
C4 i=Mj=M
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3876
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
S1 i=1j=2M-1
S2 i=2M-1j=M
S3 i=Mj=2M-1
S4 i=2M-1j=1
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3976
If (ij) not within S1-S4 and C1-C4
NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)
NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4076
Wiring Criterion
j M i jih
y yk k NB ji k jih
)1()(
W2
)()( )(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4176
Kohonenrsquos self -organizing algorithm
Self-organizing map - Wikipedia the free
encyclopedia
Neural networks (1996)
Neural networks (2002)
Neurocomputing (2011)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4276
SOM matlab toolbox
SOM Toolbox
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4376
Neural Networks 15(3) 337-347 (2002)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4476
Neurocomputing 74(12-13) 2228-2240
(2011)
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4576
Minimal wiring and maximal fitting
criteria
Maximal fitting to a center Minimal distance to the nearest center
Wiring Criterion
t i
ii t t 2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W
2
)()(
)(
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4676
Minimal wiring and maximal fitting
criteria
t iii t t
2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W2
)()(
)(
WEF(y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4776
Set β sufficiently small Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4876
Minimize the weight distance
k
k NBh
k hk
t
k
i
k k
k NBh
k h
t
ik k
ySolve
y yt t v
dy
W E d
y yt t v
0)()][(][
0)(
W][][E
)(
)(
2
k
2
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4976
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5076
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5176
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5276
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5376
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5476
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5576
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5676
Microarray expressions of yeast genes
Pat_Brown_Lab_Home_Page
Exploring the Metabolic and Genetic Control of Gene
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 376
High dimensional data clustering
K-means
Annealed K-Means
Euclidean distance measure
Mahalanobis distance measure
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 476
Data clustering
Find cluster centers
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 576
Matlab stats toolbox
Add directory toolboxstats to matlab path
Load data_9mat
Kmeans clustering
[cidx ctrs] = kmeans(XX10)
plot(XX(1)XX(2))
hold onplot(ctrs(1)ctrs(2)ro)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 676
Matlab stats toolbox
[cidx ctrs] = kmeans(XX10)
Ten clusters
Data matrix Nxd
Center locations Kxd
Membership Nx1
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 776
Download
Internet
Example Add directory cluster to matlab path
Load data_9mat
Kmeans clustering [cidx ctrs] = kmeans(XX10)
plot(XX(1)XX(2))hold onplot(ctrs(1)ctrs(2)ro)
My_Kmeans Parallel codes
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 876
[cidx ctrs] = kmeans(XX10)
Ten clusters
Data matrix dxN
Center locations dxK
Membership 1xN
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 976
K-means
An iterative approach
Non-overlapping partition
Means of partitioned groups
Non-overlapping partition Each data x
is partitioned to its nearest center
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1076
Stepwise procedure
1 Input unsupervised high dimensional data
and the number K of clusters
2 Randomly initialize K centers near the center
of given data3 Partition all data to K clusters using current
means
4 Update the center position of each group5 If halting condition holds exit
6 Go to step 3
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1176
Exercise
Draw a while-loop flow chart to illustrate
how the K-means algorithm partitions
high-dimensional data to K clusters
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1276
Exercise
Apply the K-means method to clustering
analysis of data in data_25mat
Calculate the mean distance between
each data point and its correspondent
center
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1376
Annealed K-means
Use overlapping memberships
Overcome the local minimum problem
Relaxation under an annealing process
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1476
Overlapping memberships
Each x has an overlapping membership
to each cluster
The membership to the ith cluster is
proportional to exp(-βdi) where di denote
the distance between x and the ith center
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1576
Annealed K-means
1 Input unsupervised high dimensional dataand the number K of clusters
2 Set β to sufficiently small positive numberrandomly initialize K centers near the center
of given data3 Determine overlapping memberships of all
data to K clusters
4 Update the center position of each group
5 Increase β carefully 6 If halting condition holds exit
7 Go to step 3
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1676
Exercise
Draw a while-loop flow chart to illustrate
how the annealed K-means algorithm
partition high-dimensional data to K
clusters
Implement the flow chart using Matlab
codes
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1776
Set β sufficiently small
Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1876
Minimize the weight
distance
t
i
t
i
i
t
i
i
i
t
iii
t v
t t v
t t v
dy
dE
t t v
][
][][
0)][(][
0
][][E
i
2
x
y
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1976
k
k
k
k
k
k
k k
k k
d C
d C
v
d C vd v
)exp(1
1)exp(
1
)exp()exp(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2076
j
j
k
k d
d v
)exp(
)exp()(
d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2176
Overlapping membership
Sufficiently small β
vk equals 1K
Sufficiently large β
vk = 1 if dk = mini di
= 0 otherwise
All vk are determined by the winner-take-all
principle
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2276
Why annealing
Sufficiently large β leads to the K-means
algorithm
Can annealing improve the performance
How to measure the performance for
clustering
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2376
center nearestthe][eachof distancetheupsums
10][][
][][
][][
EE
][][E
i
2
2
2
tot E
t vt if
t t v
t t v
t t v
i
t i
ii
i t
ii
i
i
t
iii
x
yx
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2476
Criterion of K-means
Minimal distance to
the nearest center
t i
ii t t 2][][E(y) yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2576
Cross distance
Refer to Math Programming
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2676
Self-organization
Clustering
Define neighboring relations of clusters
on a 2D lattice
Sorting high-dimensional data on a
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2776
Clustering by SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2876
SOM
Download Matlab Neural Networks
toolbox
Add directory nnet with sub-directories
to matlab path
Apply the SOM method to clustering
analysis of data in data_25mat
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2976
Load data
load data_25
plot(XX(1)XX(2)g)
hold on
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3076
SOM
net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100
Initialization
net = train(netXX)
plotsom(netiw11netlayers1distances)
Training
Display
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3176
net = newsom([0 02 0 01][5 5]gridtop)
5x5 lattice
2x2 matrix for two-dimensional inputs
dx2 matrix for d-dimensional inputs
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3276
net = train(netXX)
dxN data matrix
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3376
Surface construction
demo_fr2_somm
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3476
Surface reconstruction
P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]
net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)
hold onplotsom(netiw11netlayers1distances)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3576
Self-organization
Maximal fitting and minimal wiring
principle
Sort high dimensional data on a 2D
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3676
Neural optimization
K-means
Minimal distance to the nearest center
Kohonenrsquos self -organizing
Minimal wiring distance between
neighboring centers
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3776
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
C1 i=1j=1
C2 i=1j=M
C3 i=Mj=1
C4 i=Mj=M
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3876
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
S1 i=1j=2M-1
S2 i=2M-1j=M
S3 i=Mj=2M-1
S4 i=2M-1j=1
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3976
If (ij) not within S1-S4 and C1-C4
NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)
NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4076
Wiring Criterion
j M i jih
y yk k NB ji k jih
)1()(
W2
)()( )(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4176
Kohonenrsquos self -organizing algorithm
Self-organizing map - Wikipedia the free
encyclopedia
Neural networks (1996)
Neural networks (2002)
Neurocomputing (2011)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4276
SOM matlab toolbox
SOM Toolbox
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4376
Neural Networks 15(3) 337-347 (2002)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4476
Neurocomputing 74(12-13) 2228-2240
(2011)
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4576
Minimal wiring and maximal fitting
criteria
Maximal fitting to a center Minimal distance to the nearest center
Wiring Criterion
t i
ii t t 2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W
2
)()(
)(
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4676
Minimal wiring and maximal fitting
criteria
t iii t t
2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W2
)()(
)(
WEF(y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4776
Set β sufficiently small Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4876
Minimize the weight distance
k
k NBh
k hk
t
k
i
k k
k NBh
k h
t
ik k
ySolve
y yt t v
dy
W E d
y yt t v
0)()][(][
0)(
W][][E
)(
)(
2
k
2
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4976
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5076
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5176
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5276
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5376
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5476
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5576
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5676
Microarray expressions of yeast genes
Pat_Brown_Lab_Home_Page
Exploring the Metabolic and Genetic Control of Gene
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 476
Data clustering
Find cluster centers
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 576
Matlab stats toolbox
Add directory toolboxstats to matlab path
Load data_9mat
Kmeans clustering
[cidx ctrs] = kmeans(XX10)
plot(XX(1)XX(2))
hold onplot(ctrs(1)ctrs(2)ro)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 676
Matlab stats toolbox
[cidx ctrs] = kmeans(XX10)
Ten clusters
Data matrix Nxd
Center locations Kxd
Membership Nx1
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 776
Download
Internet
Example Add directory cluster to matlab path
Load data_9mat
Kmeans clustering [cidx ctrs] = kmeans(XX10)
plot(XX(1)XX(2))hold onplot(ctrs(1)ctrs(2)ro)
My_Kmeans Parallel codes
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 876
[cidx ctrs] = kmeans(XX10)
Ten clusters
Data matrix dxN
Center locations dxK
Membership 1xN
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 976
K-means
An iterative approach
Non-overlapping partition
Means of partitioned groups
Non-overlapping partition Each data x
is partitioned to its nearest center
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1076
Stepwise procedure
1 Input unsupervised high dimensional data
and the number K of clusters
2 Randomly initialize K centers near the center
of given data3 Partition all data to K clusters using current
means
4 Update the center position of each group5 If halting condition holds exit
6 Go to step 3
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1176
Exercise
Draw a while-loop flow chart to illustrate
how the K-means algorithm partitions
high-dimensional data to K clusters
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1276
Exercise
Apply the K-means method to clustering
analysis of data in data_25mat
Calculate the mean distance between
each data point and its correspondent
center
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1376
Annealed K-means
Use overlapping memberships
Overcome the local minimum problem
Relaxation under an annealing process
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1476
Overlapping memberships
Each x has an overlapping membership
to each cluster
The membership to the ith cluster is
proportional to exp(-βdi) where di denote
the distance between x and the ith center
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1576
Annealed K-means
1 Input unsupervised high dimensional dataand the number K of clusters
2 Set β to sufficiently small positive numberrandomly initialize K centers near the center
of given data3 Determine overlapping memberships of all
data to K clusters
4 Update the center position of each group
5 Increase β carefully 6 If halting condition holds exit
7 Go to step 3
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1676
Exercise
Draw a while-loop flow chart to illustrate
how the annealed K-means algorithm
partition high-dimensional data to K
clusters
Implement the flow chart using Matlab
codes
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1776
Set β sufficiently small
Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1876
Minimize the weight
distance
t
i
t
i
i
t
i
i
i
t
iii
t v
t t v
t t v
dy
dE
t t v
][
][][
0)][(][
0
][][E
i
2
x
y
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1976
k
k
k
k
k
k
k k
k k
d C
d C
v
d C vd v
)exp(1
1)exp(
1
)exp()exp(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2076
j
j
k
k d
d v
)exp(
)exp()(
d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2176
Overlapping membership
Sufficiently small β
vk equals 1K
Sufficiently large β
vk = 1 if dk = mini di
= 0 otherwise
All vk are determined by the winner-take-all
principle
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2276
Why annealing
Sufficiently large β leads to the K-means
algorithm
Can annealing improve the performance
How to measure the performance for
clustering
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2376
center nearestthe][eachof distancetheupsums
10][][
][][
][][
EE
][][E
i
2
2
2
tot E
t vt if
t t v
t t v
t t v
i
t i
ii
i t
ii
i
i
t
iii
x
yx
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2476
Criterion of K-means
Minimal distance to
the nearest center
t i
ii t t 2][][E(y) yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2576
Cross distance
Refer to Math Programming
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2676
Self-organization
Clustering
Define neighboring relations of clusters
on a 2D lattice
Sorting high-dimensional data on a
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2776
Clustering by SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2876
SOM
Download Matlab Neural Networks
toolbox
Add directory nnet with sub-directories
to matlab path
Apply the SOM method to clustering
analysis of data in data_25mat
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2976
Load data
load data_25
plot(XX(1)XX(2)g)
hold on
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3076
SOM
net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100
Initialization
net = train(netXX)
plotsom(netiw11netlayers1distances)
Training
Display
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3176
net = newsom([0 02 0 01][5 5]gridtop)
5x5 lattice
2x2 matrix for two-dimensional inputs
dx2 matrix for d-dimensional inputs
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3276
net = train(netXX)
dxN data matrix
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3376
Surface construction
demo_fr2_somm
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3476
Surface reconstruction
P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]
net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)
hold onplotsom(netiw11netlayers1distances)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3576
Self-organization
Maximal fitting and minimal wiring
principle
Sort high dimensional data on a 2D
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3676
Neural optimization
K-means
Minimal distance to the nearest center
Kohonenrsquos self -organizing
Minimal wiring distance between
neighboring centers
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3776
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
C1 i=1j=1
C2 i=1j=M
C3 i=Mj=1
C4 i=Mj=M
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3876
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
S1 i=1j=2M-1
S2 i=2M-1j=M
S3 i=Mj=2M-1
S4 i=2M-1j=1
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3976
If (ij) not within S1-S4 and C1-C4
NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)
NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4076
Wiring Criterion
j M i jih
y yk k NB ji k jih
)1()(
W2
)()( )(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4176
Kohonenrsquos self -organizing algorithm
Self-organizing map - Wikipedia the free
encyclopedia
Neural networks (1996)
Neural networks (2002)
Neurocomputing (2011)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4276
SOM matlab toolbox
SOM Toolbox
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4376
Neural Networks 15(3) 337-347 (2002)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4476
Neurocomputing 74(12-13) 2228-2240
(2011)
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4576
Minimal wiring and maximal fitting
criteria
Maximal fitting to a center Minimal distance to the nearest center
Wiring Criterion
t i
ii t t 2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W
2
)()(
)(
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4676
Minimal wiring and maximal fitting
criteria
t iii t t
2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W2
)()(
)(
WEF(y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4776
Set β sufficiently small Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4876
Minimize the weight distance
k
k NBh
k hk
t
k
i
k k
k NBh
k h
t
ik k
ySolve
y yt t v
dy
W E d
y yt t v
0)()][(][
0)(
W][][E
)(
)(
2
k
2
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4976
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5076
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5176
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5276
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5376
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5476
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5576
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5676
Microarray expressions of yeast genes
Pat_Brown_Lab_Home_Page
Exploring the Metabolic and Genetic Control of Gene
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 576
Matlab stats toolbox
Add directory toolboxstats to matlab path
Load data_9mat
Kmeans clustering
[cidx ctrs] = kmeans(XX10)
plot(XX(1)XX(2))
hold onplot(ctrs(1)ctrs(2)ro)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 676
Matlab stats toolbox
[cidx ctrs] = kmeans(XX10)
Ten clusters
Data matrix Nxd
Center locations Kxd
Membership Nx1
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 776
Download
Internet
Example Add directory cluster to matlab path
Load data_9mat
Kmeans clustering [cidx ctrs] = kmeans(XX10)
plot(XX(1)XX(2))hold onplot(ctrs(1)ctrs(2)ro)
My_Kmeans Parallel codes
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 876
[cidx ctrs] = kmeans(XX10)
Ten clusters
Data matrix dxN
Center locations dxK
Membership 1xN
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 976
K-means
An iterative approach
Non-overlapping partition
Means of partitioned groups
Non-overlapping partition Each data x
is partitioned to its nearest center
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1076
Stepwise procedure
1 Input unsupervised high dimensional data
and the number K of clusters
2 Randomly initialize K centers near the center
of given data3 Partition all data to K clusters using current
means
4 Update the center position of each group5 If halting condition holds exit
6 Go to step 3
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1176
Exercise
Draw a while-loop flow chart to illustrate
how the K-means algorithm partitions
high-dimensional data to K clusters
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1276
Exercise
Apply the K-means method to clustering
analysis of data in data_25mat
Calculate the mean distance between
each data point and its correspondent
center
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1376
Annealed K-means
Use overlapping memberships
Overcome the local minimum problem
Relaxation under an annealing process
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1476
Overlapping memberships
Each x has an overlapping membership
to each cluster
The membership to the ith cluster is
proportional to exp(-βdi) where di denote
the distance between x and the ith center
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1576
Annealed K-means
1 Input unsupervised high dimensional dataand the number K of clusters
2 Set β to sufficiently small positive numberrandomly initialize K centers near the center
of given data3 Determine overlapping memberships of all
data to K clusters
4 Update the center position of each group
5 Increase β carefully 6 If halting condition holds exit
7 Go to step 3
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1676
Exercise
Draw a while-loop flow chart to illustrate
how the annealed K-means algorithm
partition high-dimensional data to K
clusters
Implement the flow chart using Matlab
codes
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1776
Set β sufficiently small
Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1876
Minimize the weight
distance
t
i
t
i
i
t
i
i
i
t
iii
t v
t t v
t t v
dy
dE
t t v
][
][][
0)][(][
0
][][E
i
2
x
y
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1976
k
k
k
k
k
k
k k
k k
d C
d C
v
d C vd v
)exp(1
1)exp(
1
)exp()exp(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2076
j
j
k
k d
d v
)exp(
)exp()(
d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2176
Overlapping membership
Sufficiently small β
vk equals 1K
Sufficiently large β
vk = 1 if dk = mini di
= 0 otherwise
All vk are determined by the winner-take-all
principle
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2276
Why annealing
Sufficiently large β leads to the K-means
algorithm
Can annealing improve the performance
How to measure the performance for
clustering
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2376
center nearestthe][eachof distancetheupsums
10][][
][][
][][
EE
][][E
i
2
2
2
tot E
t vt if
t t v
t t v
t t v
i
t i
ii
i t
ii
i
i
t
iii
x
yx
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2476
Criterion of K-means
Minimal distance to
the nearest center
t i
ii t t 2][][E(y) yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2576
Cross distance
Refer to Math Programming
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2676
Self-organization
Clustering
Define neighboring relations of clusters
on a 2D lattice
Sorting high-dimensional data on a
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2776
Clustering by SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2876
SOM
Download Matlab Neural Networks
toolbox
Add directory nnet with sub-directories
to matlab path
Apply the SOM method to clustering
analysis of data in data_25mat
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2976
Load data
load data_25
plot(XX(1)XX(2)g)
hold on
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3076
SOM
net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100
Initialization
net = train(netXX)
plotsom(netiw11netlayers1distances)
Training
Display
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3176
net = newsom([0 02 0 01][5 5]gridtop)
5x5 lattice
2x2 matrix for two-dimensional inputs
dx2 matrix for d-dimensional inputs
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3276
net = train(netXX)
dxN data matrix
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3376
Surface construction
demo_fr2_somm
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3476
Surface reconstruction
P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]
net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)
hold onplotsom(netiw11netlayers1distances)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3576
Self-organization
Maximal fitting and minimal wiring
principle
Sort high dimensional data on a 2D
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3676
Neural optimization
K-means
Minimal distance to the nearest center
Kohonenrsquos self -organizing
Minimal wiring distance between
neighboring centers
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3776
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
C1 i=1j=1
C2 i=1j=M
C3 i=Mj=1
C4 i=Mj=M
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3876
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
S1 i=1j=2M-1
S2 i=2M-1j=M
S3 i=Mj=2M-1
S4 i=2M-1j=1
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3976
If (ij) not within S1-S4 and C1-C4
NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)
NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4076
Wiring Criterion
j M i jih
y yk k NB ji k jih
)1()(
W2
)()( )(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4176
Kohonenrsquos self -organizing algorithm
Self-organizing map - Wikipedia the free
encyclopedia
Neural networks (1996)
Neural networks (2002)
Neurocomputing (2011)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4276
SOM matlab toolbox
SOM Toolbox
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4376
Neural Networks 15(3) 337-347 (2002)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4476
Neurocomputing 74(12-13) 2228-2240
(2011)
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4576
Minimal wiring and maximal fitting
criteria
Maximal fitting to a center Minimal distance to the nearest center
Wiring Criterion
t i
ii t t 2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W
2
)()(
)(
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4676
Minimal wiring and maximal fitting
criteria
t iii t t
2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W2
)()(
)(
WEF(y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4776
Set β sufficiently small Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4876
Minimize the weight distance
k
k NBh
k hk
t
k
i
k k
k NBh
k h
t
ik k
ySolve
y yt t v
dy
W E d
y yt t v
0)()][(][
0)(
W][][E
)(
)(
2
k
2
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4976
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5076
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5176
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5276
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5376
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5476
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5576
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5676
Microarray expressions of yeast genes
Pat_Brown_Lab_Home_Page
Exploring the Metabolic and Genetic Control of Gene
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 676
Matlab stats toolbox
[cidx ctrs] = kmeans(XX10)
Ten clusters
Data matrix Nxd
Center locations Kxd
Membership Nx1
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 776
Download
Internet
Example Add directory cluster to matlab path
Load data_9mat
Kmeans clustering [cidx ctrs] = kmeans(XX10)
plot(XX(1)XX(2))hold onplot(ctrs(1)ctrs(2)ro)
My_Kmeans Parallel codes
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 876
[cidx ctrs] = kmeans(XX10)
Ten clusters
Data matrix dxN
Center locations dxK
Membership 1xN
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 976
K-means
An iterative approach
Non-overlapping partition
Means of partitioned groups
Non-overlapping partition Each data x
is partitioned to its nearest center
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1076
Stepwise procedure
1 Input unsupervised high dimensional data
and the number K of clusters
2 Randomly initialize K centers near the center
of given data3 Partition all data to K clusters using current
means
4 Update the center position of each group5 If halting condition holds exit
6 Go to step 3
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1176
Exercise
Draw a while-loop flow chart to illustrate
how the K-means algorithm partitions
high-dimensional data to K clusters
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1276
Exercise
Apply the K-means method to clustering
analysis of data in data_25mat
Calculate the mean distance between
each data point and its correspondent
center
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1376
Annealed K-means
Use overlapping memberships
Overcome the local minimum problem
Relaxation under an annealing process
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1476
Overlapping memberships
Each x has an overlapping membership
to each cluster
The membership to the ith cluster is
proportional to exp(-βdi) where di denote
the distance between x and the ith center
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1576
Annealed K-means
1 Input unsupervised high dimensional dataand the number K of clusters
2 Set β to sufficiently small positive numberrandomly initialize K centers near the center
of given data3 Determine overlapping memberships of all
data to K clusters
4 Update the center position of each group
5 Increase β carefully 6 If halting condition holds exit
7 Go to step 3
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1676
Exercise
Draw a while-loop flow chart to illustrate
how the annealed K-means algorithm
partition high-dimensional data to K
clusters
Implement the flow chart using Matlab
codes
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1776
Set β sufficiently small
Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1876
Minimize the weight
distance
t
i
t
i
i
t
i
i
i
t
iii
t v
t t v
t t v
dy
dE
t t v
][
][][
0)][(][
0
][][E
i
2
x
y
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1976
k
k
k
k
k
k
k k
k k
d C
d C
v
d C vd v
)exp(1
1)exp(
1
)exp()exp(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2076
j
j
k
k d
d v
)exp(
)exp()(
d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2176
Overlapping membership
Sufficiently small β
vk equals 1K
Sufficiently large β
vk = 1 if dk = mini di
= 0 otherwise
All vk are determined by the winner-take-all
principle
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2276
Why annealing
Sufficiently large β leads to the K-means
algorithm
Can annealing improve the performance
How to measure the performance for
clustering
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2376
center nearestthe][eachof distancetheupsums
10][][
][][
][][
EE
][][E
i
2
2
2
tot E
t vt if
t t v
t t v
t t v
i
t i
ii
i t
ii
i
i
t
iii
x
yx
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2476
Criterion of K-means
Minimal distance to
the nearest center
t i
ii t t 2][][E(y) yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2576
Cross distance
Refer to Math Programming
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2676
Self-organization
Clustering
Define neighboring relations of clusters
on a 2D lattice
Sorting high-dimensional data on a
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2776
Clustering by SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2876
SOM
Download Matlab Neural Networks
toolbox
Add directory nnet with sub-directories
to matlab path
Apply the SOM method to clustering
analysis of data in data_25mat
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2976
Load data
load data_25
plot(XX(1)XX(2)g)
hold on
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3076
SOM
net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100
Initialization
net = train(netXX)
plotsom(netiw11netlayers1distances)
Training
Display
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3176
net = newsom([0 02 0 01][5 5]gridtop)
5x5 lattice
2x2 matrix for two-dimensional inputs
dx2 matrix for d-dimensional inputs
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3276
net = train(netXX)
dxN data matrix
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3376
Surface construction
demo_fr2_somm
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3476
Surface reconstruction
P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]
net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)
hold onplotsom(netiw11netlayers1distances)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3576
Self-organization
Maximal fitting and minimal wiring
principle
Sort high dimensional data on a 2D
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3676
Neural optimization
K-means
Minimal distance to the nearest center
Kohonenrsquos self -organizing
Minimal wiring distance between
neighboring centers
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3776
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
C1 i=1j=1
C2 i=1j=M
C3 i=Mj=1
C4 i=Mj=M
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3876
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
S1 i=1j=2M-1
S2 i=2M-1j=M
S3 i=Mj=2M-1
S4 i=2M-1j=1
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3976
If (ij) not within S1-S4 and C1-C4
NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)
NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4076
Wiring Criterion
j M i jih
y yk k NB ji k jih
)1()(
W2
)()( )(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4176
Kohonenrsquos self -organizing algorithm
Self-organizing map - Wikipedia the free
encyclopedia
Neural networks (1996)
Neural networks (2002)
Neurocomputing (2011)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4276
SOM matlab toolbox
SOM Toolbox
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4376
Neural Networks 15(3) 337-347 (2002)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4476
Neurocomputing 74(12-13) 2228-2240
(2011)
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4576
Minimal wiring and maximal fitting
criteria
Maximal fitting to a center Minimal distance to the nearest center
Wiring Criterion
t i
ii t t 2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W
2
)()(
)(
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4676
Minimal wiring and maximal fitting
criteria
t iii t t
2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W2
)()(
)(
WEF(y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4776
Set β sufficiently small Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4876
Minimize the weight distance
k
k NBh
k hk
t
k
i
k k
k NBh
k h
t
ik k
ySolve
y yt t v
dy
W E d
y yt t v
0)()][(][
0)(
W][][E
)(
)(
2
k
2
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4976
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5076
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5176
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5276
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5376
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5476
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5576
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5676
Microarray expressions of yeast genes
Pat_Brown_Lab_Home_Page
Exploring the Metabolic and Genetic Control of Gene
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 776
Download
Internet
Example Add directory cluster to matlab path
Load data_9mat
Kmeans clustering [cidx ctrs] = kmeans(XX10)
plot(XX(1)XX(2))hold onplot(ctrs(1)ctrs(2)ro)
My_Kmeans Parallel codes
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 876
[cidx ctrs] = kmeans(XX10)
Ten clusters
Data matrix dxN
Center locations dxK
Membership 1xN
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 976
K-means
An iterative approach
Non-overlapping partition
Means of partitioned groups
Non-overlapping partition Each data x
is partitioned to its nearest center
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1076
Stepwise procedure
1 Input unsupervised high dimensional data
and the number K of clusters
2 Randomly initialize K centers near the center
of given data3 Partition all data to K clusters using current
means
4 Update the center position of each group5 If halting condition holds exit
6 Go to step 3
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1176
Exercise
Draw a while-loop flow chart to illustrate
how the K-means algorithm partitions
high-dimensional data to K clusters
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1276
Exercise
Apply the K-means method to clustering
analysis of data in data_25mat
Calculate the mean distance between
each data point and its correspondent
center
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1376
Annealed K-means
Use overlapping memberships
Overcome the local minimum problem
Relaxation under an annealing process
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1476
Overlapping memberships
Each x has an overlapping membership
to each cluster
The membership to the ith cluster is
proportional to exp(-βdi) where di denote
the distance between x and the ith center
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1576
Annealed K-means
1 Input unsupervised high dimensional dataand the number K of clusters
2 Set β to sufficiently small positive numberrandomly initialize K centers near the center
of given data3 Determine overlapping memberships of all
data to K clusters
4 Update the center position of each group
5 Increase β carefully 6 If halting condition holds exit
7 Go to step 3
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1676
Exercise
Draw a while-loop flow chart to illustrate
how the annealed K-means algorithm
partition high-dimensional data to K
clusters
Implement the flow chart using Matlab
codes
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1776
Set β sufficiently small
Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1876
Minimize the weight
distance
t
i
t
i
i
t
i
i
i
t
iii
t v
t t v
t t v
dy
dE
t t v
][
][][
0)][(][
0
][][E
i
2
x
y
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1976
k
k
k
k
k
k
k k
k k
d C
d C
v
d C vd v
)exp(1
1)exp(
1
)exp()exp(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2076
j
j
k
k d
d v
)exp(
)exp()(
d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2176
Overlapping membership
Sufficiently small β
vk equals 1K
Sufficiently large β
vk = 1 if dk = mini di
= 0 otherwise
All vk are determined by the winner-take-all
principle
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2276
Why annealing
Sufficiently large β leads to the K-means
algorithm
Can annealing improve the performance
How to measure the performance for
clustering
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2376
center nearestthe][eachof distancetheupsums
10][][
][][
][][
EE
][][E
i
2
2
2
tot E
t vt if
t t v
t t v
t t v
i
t i
ii
i t
ii
i
i
t
iii
x
yx
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2476
Criterion of K-means
Minimal distance to
the nearest center
t i
ii t t 2][][E(y) yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2576
Cross distance
Refer to Math Programming
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2676
Self-organization
Clustering
Define neighboring relations of clusters
on a 2D lattice
Sorting high-dimensional data on a
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2776
Clustering by SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2876
SOM
Download Matlab Neural Networks
toolbox
Add directory nnet with sub-directories
to matlab path
Apply the SOM method to clustering
analysis of data in data_25mat
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2976
Load data
load data_25
plot(XX(1)XX(2)g)
hold on
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3076
SOM
net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100
Initialization
net = train(netXX)
plotsom(netiw11netlayers1distances)
Training
Display
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3176
net = newsom([0 02 0 01][5 5]gridtop)
5x5 lattice
2x2 matrix for two-dimensional inputs
dx2 matrix for d-dimensional inputs
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3276
net = train(netXX)
dxN data matrix
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3376
Surface construction
demo_fr2_somm
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3476
Surface reconstruction
P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]
net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)
hold onplotsom(netiw11netlayers1distances)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3576
Self-organization
Maximal fitting and minimal wiring
principle
Sort high dimensional data on a 2D
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3676
Neural optimization
K-means
Minimal distance to the nearest center
Kohonenrsquos self -organizing
Minimal wiring distance between
neighboring centers
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3776
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
C1 i=1j=1
C2 i=1j=M
C3 i=Mj=1
C4 i=Mj=M
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3876
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
S1 i=1j=2M-1
S2 i=2M-1j=M
S3 i=Mj=2M-1
S4 i=2M-1j=1
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3976
If (ij) not within S1-S4 and C1-C4
NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)
NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4076
Wiring Criterion
j M i jih
y yk k NB ji k jih
)1()(
W2
)()( )(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4176
Kohonenrsquos self -organizing algorithm
Self-organizing map - Wikipedia the free
encyclopedia
Neural networks (1996)
Neural networks (2002)
Neurocomputing (2011)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4276
SOM matlab toolbox
SOM Toolbox
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4376
Neural Networks 15(3) 337-347 (2002)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4476
Neurocomputing 74(12-13) 2228-2240
(2011)
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4576
Minimal wiring and maximal fitting
criteria
Maximal fitting to a center Minimal distance to the nearest center
Wiring Criterion
t i
ii t t 2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W
2
)()(
)(
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4676
Minimal wiring and maximal fitting
criteria
t iii t t
2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W2
)()(
)(
WEF(y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4776
Set β sufficiently small Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4876
Minimize the weight distance
k
k NBh
k hk
t
k
i
k k
k NBh
k h
t
ik k
ySolve
y yt t v
dy
W E d
y yt t v
0)()][(][
0)(
W][][E
)(
)(
2
k
2
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4976
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5076
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5176
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5276
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5376
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5476
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5576
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5676
Microarray expressions of yeast genes
Pat_Brown_Lab_Home_Page
Exploring the Metabolic and Genetic Control of Gene
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 876
[cidx ctrs] = kmeans(XX10)
Ten clusters
Data matrix dxN
Center locations dxK
Membership 1xN
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 976
K-means
An iterative approach
Non-overlapping partition
Means of partitioned groups
Non-overlapping partition Each data x
is partitioned to its nearest center
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1076
Stepwise procedure
1 Input unsupervised high dimensional data
and the number K of clusters
2 Randomly initialize K centers near the center
of given data3 Partition all data to K clusters using current
means
4 Update the center position of each group5 If halting condition holds exit
6 Go to step 3
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1176
Exercise
Draw a while-loop flow chart to illustrate
how the K-means algorithm partitions
high-dimensional data to K clusters
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1276
Exercise
Apply the K-means method to clustering
analysis of data in data_25mat
Calculate the mean distance between
each data point and its correspondent
center
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1376
Annealed K-means
Use overlapping memberships
Overcome the local minimum problem
Relaxation under an annealing process
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1476
Overlapping memberships
Each x has an overlapping membership
to each cluster
The membership to the ith cluster is
proportional to exp(-βdi) where di denote
the distance between x and the ith center
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1576
Annealed K-means
1 Input unsupervised high dimensional dataand the number K of clusters
2 Set β to sufficiently small positive numberrandomly initialize K centers near the center
of given data3 Determine overlapping memberships of all
data to K clusters
4 Update the center position of each group
5 Increase β carefully 6 If halting condition holds exit
7 Go to step 3
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1676
Exercise
Draw a while-loop flow chart to illustrate
how the annealed K-means algorithm
partition high-dimensional data to K
clusters
Implement the flow chart using Matlab
codes
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1776
Set β sufficiently small
Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1876
Minimize the weight
distance
t
i
t
i
i
t
i
i
i
t
iii
t v
t t v
t t v
dy
dE
t t v
][
][][
0)][(][
0
][][E
i
2
x
y
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1976
k
k
k
k
k
k
k k
k k
d C
d C
v
d C vd v
)exp(1
1)exp(
1
)exp()exp(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2076
j
j
k
k d
d v
)exp(
)exp()(
d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2176
Overlapping membership
Sufficiently small β
vk equals 1K
Sufficiently large β
vk = 1 if dk = mini di
= 0 otherwise
All vk are determined by the winner-take-all
principle
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2276
Why annealing
Sufficiently large β leads to the K-means
algorithm
Can annealing improve the performance
How to measure the performance for
clustering
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2376
center nearestthe][eachof distancetheupsums
10][][
][][
][][
EE
][][E
i
2
2
2
tot E
t vt if
t t v
t t v
t t v
i
t i
ii
i t
ii
i
i
t
iii
x
yx
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2476
Criterion of K-means
Minimal distance to
the nearest center
t i
ii t t 2][][E(y) yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2576
Cross distance
Refer to Math Programming
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2676
Self-organization
Clustering
Define neighboring relations of clusters
on a 2D lattice
Sorting high-dimensional data on a
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2776
Clustering by SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2876
SOM
Download Matlab Neural Networks
toolbox
Add directory nnet with sub-directories
to matlab path
Apply the SOM method to clustering
analysis of data in data_25mat
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2976
Load data
load data_25
plot(XX(1)XX(2)g)
hold on
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3076
SOM
net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100
Initialization
net = train(netXX)
plotsom(netiw11netlayers1distances)
Training
Display
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3176
net = newsom([0 02 0 01][5 5]gridtop)
5x5 lattice
2x2 matrix for two-dimensional inputs
dx2 matrix for d-dimensional inputs
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3276
net = train(netXX)
dxN data matrix
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3376
Surface construction
demo_fr2_somm
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3476
Surface reconstruction
P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]
net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)
hold onplotsom(netiw11netlayers1distances)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3576
Self-organization
Maximal fitting and minimal wiring
principle
Sort high dimensional data on a 2D
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3676
Neural optimization
K-means
Minimal distance to the nearest center
Kohonenrsquos self -organizing
Minimal wiring distance between
neighboring centers
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3776
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
C1 i=1j=1
C2 i=1j=M
C3 i=Mj=1
C4 i=Mj=M
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3876
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
S1 i=1j=2M-1
S2 i=2M-1j=M
S3 i=Mj=2M-1
S4 i=2M-1j=1
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3976
If (ij) not within S1-S4 and C1-C4
NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)
NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4076
Wiring Criterion
j M i jih
y yk k NB ji k jih
)1()(
W2
)()( )(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4176
Kohonenrsquos self -organizing algorithm
Self-organizing map - Wikipedia the free
encyclopedia
Neural networks (1996)
Neural networks (2002)
Neurocomputing (2011)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4276
SOM matlab toolbox
SOM Toolbox
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4376
Neural Networks 15(3) 337-347 (2002)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4476
Neurocomputing 74(12-13) 2228-2240
(2011)
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4576
Minimal wiring and maximal fitting
criteria
Maximal fitting to a center Minimal distance to the nearest center
Wiring Criterion
t i
ii t t 2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W
2
)()(
)(
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4676
Minimal wiring and maximal fitting
criteria
t iii t t
2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W2
)()(
)(
WEF(y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4776
Set β sufficiently small Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4876
Minimize the weight distance
k
k NBh
k hk
t
k
i
k k
k NBh
k h
t
ik k
ySolve
y yt t v
dy
W E d
y yt t v
0)()][(][
0)(
W][][E
)(
)(
2
k
2
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4976
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5076
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5176
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5276
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5376
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5476
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5576
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5676
Microarray expressions of yeast genes
Pat_Brown_Lab_Home_Page
Exploring the Metabolic and Genetic Control of Gene
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 976
K-means
An iterative approach
Non-overlapping partition
Means of partitioned groups
Non-overlapping partition Each data x
is partitioned to its nearest center
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1076
Stepwise procedure
1 Input unsupervised high dimensional data
and the number K of clusters
2 Randomly initialize K centers near the center
of given data3 Partition all data to K clusters using current
means
4 Update the center position of each group5 If halting condition holds exit
6 Go to step 3
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1176
Exercise
Draw a while-loop flow chart to illustrate
how the K-means algorithm partitions
high-dimensional data to K clusters
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1276
Exercise
Apply the K-means method to clustering
analysis of data in data_25mat
Calculate the mean distance between
each data point and its correspondent
center
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1376
Annealed K-means
Use overlapping memberships
Overcome the local minimum problem
Relaxation under an annealing process
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1476
Overlapping memberships
Each x has an overlapping membership
to each cluster
The membership to the ith cluster is
proportional to exp(-βdi) where di denote
the distance between x and the ith center
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1576
Annealed K-means
1 Input unsupervised high dimensional dataand the number K of clusters
2 Set β to sufficiently small positive numberrandomly initialize K centers near the center
of given data3 Determine overlapping memberships of all
data to K clusters
4 Update the center position of each group
5 Increase β carefully 6 If halting condition holds exit
7 Go to step 3
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1676
Exercise
Draw a while-loop flow chart to illustrate
how the annealed K-means algorithm
partition high-dimensional data to K
clusters
Implement the flow chart using Matlab
codes
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1776
Set β sufficiently small
Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1876
Minimize the weight
distance
t
i
t
i
i
t
i
i
i
t
iii
t v
t t v
t t v
dy
dE
t t v
][
][][
0)][(][
0
][][E
i
2
x
y
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1976
k
k
k
k
k
k
k k
k k
d C
d C
v
d C vd v
)exp(1
1)exp(
1
)exp()exp(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2076
j
j
k
k d
d v
)exp(
)exp()(
d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2176
Overlapping membership
Sufficiently small β
vk equals 1K
Sufficiently large β
vk = 1 if dk = mini di
= 0 otherwise
All vk are determined by the winner-take-all
principle
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2276
Why annealing
Sufficiently large β leads to the K-means
algorithm
Can annealing improve the performance
How to measure the performance for
clustering
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2376
center nearestthe][eachof distancetheupsums
10][][
][][
][][
EE
][][E
i
2
2
2
tot E
t vt if
t t v
t t v
t t v
i
t i
ii
i t
ii
i
i
t
iii
x
yx
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2476
Criterion of K-means
Minimal distance to
the nearest center
t i
ii t t 2][][E(y) yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2576
Cross distance
Refer to Math Programming
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2676
Self-organization
Clustering
Define neighboring relations of clusters
on a 2D lattice
Sorting high-dimensional data on a
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2776
Clustering by SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2876
SOM
Download Matlab Neural Networks
toolbox
Add directory nnet with sub-directories
to matlab path
Apply the SOM method to clustering
analysis of data in data_25mat
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2976
Load data
load data_25
plot(XX(1)XX(2)g)
hold on
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3076
SOM
net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100
Initialization
net = train(netXX)
plotsom(netiw11netlayers1distances)
Training
Display
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3176
net = newsom([0 02 0 01][5 5]gridtop)
5x5 lattice
2x2 matrix for two-dimensional inputs
dx2 matrix for d-dimensional inputs
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3276
net = train(netXX)
dxN data matrix
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3376
Surface construction
demo_fr2_somm
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3476
Surface reconstruction
P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]
net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)
hold onplotsom(netiw11netlayers1distances)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3576
Self-organization
Maximal fitting and minimal wiring
principle
Sort high dimensional data on a 2D
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3676
Neural optimization
K-means
Minimal distance to the nearest center
Kohonenrsquos self -organizing
Minimal wiring distance between
neighboring centers
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3776
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
C1 i=1j=1
C2 i=1j=M
C3 i=Mj=1
C4 i=Mj=M
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3876
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
S1 i=1j=2M-1
S2 i=2M-1j=M
S3 i=Mj=2M-1
S4 i=2M-1j=1
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3976
If (ij) not within S1-S4 and C1-C4
NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)
NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4076
Wiring Criterion
j M i jih
y yk k NB ji k jih
)1()(
W2
)()( )(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4176
Kohonenrsquos self -organizing algorithm
Self-organizing map - Wikipedia the free
encyclopedia
Neural networks (1996)
Neural networks (2002)
Neurocomputing (2011)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4276
SOM matlab toolbox
SOM Toolbox
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4376
Neural Networks 15(3) 337-347 (2002)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4476
Neurocomputing 74(12-13) 2228-2240
(2011)
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4576
Minimal wiring and maximal fitting
criteria
Maximal fitting to a center Minimal distance to the nearest center
Wiring Criterion
t i
ii t t 2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W
2
)()(
)(
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4676
Minimal wiring and maximal fitting
criteria
t iii t t
2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W2
)()(
)(
WEF(y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4776
Set β sufficiently small Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4876
Minimize the weight distance
k
k NBh
k hk
t
k
i
k k
k NBh
k h
t
ik k
ySolve
y yt t v
dy
W E d
y yt t v
0)()][(][
0)(
W][][E
)(
)(
2
k
2
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4976
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5076
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5176
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5276
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5376
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5476
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5576
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5676
Microarray expressions of yeast genes
Pat_Brown_Lab_Home_Page
Exploring the Metabolic and Genetic Control of Gene
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1076
Stepwise procedure
1 Input unsupervised high dimensional data
and the number K of clusters
2 Randomly initialize K centers near the center
of given data3 Partition all data to K clusters using current
means
4 Update the center position of each group5 If halting condition holds exit
6 Go to step 3
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1176
Exercise
Draw a while-loop flow chart to illustrate
how the K-means algorithm partitions
high-dimensional data to K clusters
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1276
Exercise
Apply the K-means method to clustering
analysis of data in data_25mat
Calculate the mean distance between
each data point and its correspondent
center
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1376
Annealed K-means
Use overlapping memberships
Overcome the local minimum problem
Relaxation under an annealing process
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1476
Overlapping memberships
Each x has an overlapping membership
to each cluster
The membership to the ith cluster is
proportional to exp(-βdi) where di denote
the distance between x and the ith center
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1576
Annealed K-means
1 Input unsupervised high dimensional dataand the number K of clusters
2 Set β to sufficiently small positive numberrandomly initialize K centers near the center
of given data3 Determine overlapping memberships of all
data to K clusters
4 Update the center position of each group
5 Increase β carefully 6 If halting condition holds exit
7 Go to step 3
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1676
Exercise
Draw a while-loop flow chart to illustrate
how the annealed K-means algorithm
partition high-dimensional data to K
clusters
Implement the flow chart using Matlab
codes
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1776
Set β sufficiently small
Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1876
Minimize the weight
distance
t
i
t
i
i
t
i
i
i
t
iii
t v
t t v
t t v
dy
dE
t t v
][
][][
0)][(][
0
][][E
i
2
x
y
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1976
k
k
k
k
k
k
k k
k k
d C
d C
v
d C vd v
)exp(1
1)exp(
1
)exp()exp(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2076
j
j
k
k d
d v
)exp(
)exp()(
d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2176
Overlapping membership
Sufficiently small β
vk equals 1K
Sufficiently large β
vk = 1 if dk = mini di
= 0 otherwise
All vk are determined by the winner-take-all
principle
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2276
Why annealing
Sufficiently large β leads to the K-means
algorithm
Can annealing improve the performance
How to measure the performance for
clustering
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2376
center nearestthe][eachof distancetheupsums
10][][
][][
][][
EE
][][E
i
2
2
2
tot E
t vt if
t t v
t t v
t t v
i
t i
ii
i t
ii
i
i
t
iii
x
yx
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2476
Criterion of K-means
Minimal distance to
the nearest center
t i
ii t t 2][][E(y) yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2576
Cross distance
Refer to Math Programming
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2676
Self-organization
Clustering
Define neighboring relations of clusters
on a 2D lattice
Sorting high-dimensional data on a
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2776
Clustering by SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2876
SOM
Download Matlab Neural Networks
toolbox
Add directory nnet with sub-directories
to matlab path
Apply the SOM method to clustering
analysis of data in data_25mat
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2976
Load data
load data_25
plot(XX(1)XX(2)g)
hold on
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3076
SOM
net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100
Initialization
net = train(netXX)
plotsom(netiw11netlayers1distances)
Training
Display
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3176
net = newsom([0 02 0 01][5 5]gridtop)
5x5 lattice
2x2 matrix for two-dimensional inputs
dx2 matrix for d-dimensional inputs
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3276
net = train(netXX)
dxN data matrix
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3376
Surface construction
demo_fr2_somm
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3476
Surface reconstruction
P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]
net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)
hold onplotsom(netiw11netlayers1distances)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3576
Self-organization
Maximal fitting and minimal wiring
principle
Sort high dimensional data on a 2D
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3676
Neural optimization
K-means
Minimal distance to the nearest center
Kohonenrsquos self -organizing
Minimal wiring distance between
neighboring centers
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3776
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
C1 i=1j=1
C2 i=1j=M
C3 i=Mj=1
C4 i=Mj=M
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3876
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
S1 i=1j=2M-1
S2 i=2M-1j=M
S3 i=Mj=2M-1
S4 i=2M-1j=1
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3976
If (ij) not within S1-S4 and C1-C4
NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)
NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4076
Wiring Criterion
j M i jih
y yk k NB ji k jih
)1()(
W2
)()( )(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4176
Kohonenrsquos self -organizing algorithm
Self-organizing map - Wikipedia the free
encyclopedia
Neural networks (1996)
Neural networks (2002)
Neurocomputing (2011)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4276
SOM matlab toolbox
SOM Toolbox
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4376
Neural Networks 15(3) 337-347 (2002)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4476
Neurocomputing 74(12-13) 2228-2240
(2011)
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4576
Minimal wiring and maximal fitting
criteria
Maximal fitting to a center Minimal distance to the nearest center
Wiring Criterion
t i
ii t t 2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W
2
)()(
)(
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4676
Minimal wiring and maximal fitting
criteria
t iii t t
2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W2
)()(
)(
WEF(y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4776
Set β sufficiently small Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4876
Minimize the weight distance
k
k NBh
k hk
t
k
i
k k
k NBh
k h
t
ik k
ySolve
y yt t v
dy
W E d
y yt t v
0)()][(][
0)(
W][][E
)(
)(
2
k
2
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4976
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5076
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5176
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5276
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5376
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5476
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5576
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5676
Microarray expressions of yeast genes
Pat_Brown_Lab_Home_Page
Exploring the Metabolic and Genetic Control of Gene
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1176
Exercise
Draw a while-loop flow chart to illustrate
how the K-means algorithm partitions
high-dimensional data to K clusters
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1276
Exercise
Apply the K-means method to clustering
analysis of data in data_25mat
Calculate the mean distance between
each data point and its correspondent
center
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1376
Annealed K-means
Use overlapping memberships
Overcome the local minimum problem
Relaxation under an annealing process
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1476
Overlapping memberships
Each x has an overlapping membership
to each cluster
The membership to the ith cluster is
proportional to exp(-βdi) where di denote
the distance between x and the ith center
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1576
Annealed K-means
1 Input unsupervised high dimensional dataand the number K of clusters
2 Set β to sufficiently small positive numberrandomly initialize K centers near the center
of given data3 Determine overlapping memberships of all
data to K clusters
4 Update the center position of each group
5 Increase β carefully 6 If halting condition holds exit
7 Go to step 3
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1676
Exercise
Draw a while-loop flow chart to illustrate
how the annealed K-means algorithm
partition high-dimensional data to K
clusters
Implement the flow chart using Matlab
codes
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1776
Set β sufficiently small
Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1876
Minimize the weight
distance
t
i
t
i
i
t
i
i
i
t
iii
t v
t t v
t t v
dy
dE
t t v
][
][][
0)][(][
0
][][E
i
2
x
y
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1976
k
k
k
k
k
k
k k
k k
d C
d C
v
d C vd v
)exp(1
1)exp(
1
)exp()exp(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2076
j
j
k
k d
d v
)exp(
)exp()(
d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2176
Overlapping membership
Sufficiently small β
vk equals 1K
Sufficiently large β
vk = 1 if dk = mini di
= 0 otherwise
All vk are determined by the winner-take-all
principle
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2276
Why annealing
Sufficiently large β leads to the K-means
algorithm
Can annealing improve the performance
How to measure the performance for
clustering
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2376
center nearestthe][eachof distancetheupsums
10][][
][][
][][
EE
][][E
i
2
2
2
tot E
t vt if
t t v
t t v
t t v
i
t i
ii
i t
ii
i
i
t
iii
x
yx
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2476
Criterion of K-means
Minimal distance to
the nearest center
t i
ii t t 2][][E(y) yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2576
Cross distance
Refer to Math Programming
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2676
Self-organization
Clustering
Define neighboring relations of clusters
on a 2D lattice
Sorting high-dimensional data on a
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2776
Clustering by SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2876
SOM
Download Matlab Neural Networks
toolbox
Add directory nnet with sub-directories
to matlab path
Apply the SOM method to clustering
analysis of data in data_25mat
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2976
Load data
load data_25
plot(XX(1)XX(2)g)
hold on
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3076
SOM
net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100
Initialization
net = train(netXX)
plotsom(netiw11netlayers1distances)
Training
Display
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3176
net = newsom([0 02 0 01][5 5]gridtop)
5x5 lattice
2x2 matrix for two-dimensional inputs
dx2 matrix for d-dimensional inputs
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3276
net = train(netXX)
dxN data matrix
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3376
Surface construction
demo_fr2_somm
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3476
Surface reconstruction
P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]
net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)
hold onplotsom(netiw11netlayers1distances)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3576
Self-organization
Maximal fitting and minimal wiring
principle
Sort high dimensional data on a 2D
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3676
Neural optimization
K-means
Minimal distance to the nearest center
Kohonenrsquos self -organizing
Minimal wiring distance between
neighboring centers
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3776
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
C1 i=1j=1
C2 i=1j=M
C3 i=Mj=1
C4 i=Mj=M
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3876
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
S1 i=1j=2M-1
S2 i=2M-1j=M
S3 i=Mj=2M-1
S4 i=2M-1j=1
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3976
If (ij) not within S1-S4 and C1-C4
NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)
NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4076
Wiring Criterion
j M i jih
y yk k NB ji k jih
)1()(
W2
)()( )(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4176
Kohonenrsquos self -organizing algorithm
Self-organizing map - Wikipedia the free
encyclopedia
Neural networks (1996)
Neural networks (2002)
Neurocomputing (2011)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4276
SOM matlab toolbox
SOM Toolbox
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4376
Neural Networks 15(3) 337-347 (2002)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4476
Neurocomputing 74(12-13) 2228-2240
(2011)
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4576
Minimal wiring and maximal fitting
criteria
Maximal fitting to a center Minimal distance to the nearest center
Wiring Criterion
t i
ii t t 2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W
2
)()(
)(
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4676
Minimal wiring and maximal fitting
criteria
t iii t t
2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W2
)()(
)(
WEF(y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4776
Set β sufficiently small Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4876
Minimize the weight distance
k
k NBh
k hk
t
k
i
k k
k NBh
k h
t
ik k
ySolve
y yt t v
dy
W E d
y yt t v
0)()][(][
0)(
W][][E
)(
)(
2
k
2
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4976
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5076
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5176
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5276
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5376
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5476
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5576
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5676
Microarray expressions of yeast genes
Pat_Brown_Lab_Home_Page
Exploring the Metabolic and Genetic Control of Gene
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1276
Exercise
Apply the K-means method to clustering
analysis of data in data_25mat
Calculate the mean distance between
each data point and its correspondent
center
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1376
Annealed K-means
Use overlapping memberships
Overcome the local minimum problem
Relaxation under an annealing process
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1476
Overlapping memberships
Each x has an overlapping membership
to each cluster
The membership to the ith cluster is
proportional to exp(-βdi) where di denote
the distance between x and the ith center
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1576
Annealed K-means
1 Input unsupervised high dimensional dataand the number K of clusters
2 Set β to sufficiently small positive numberrandomly initialize K centers near the center
of given data3 Determine overlapping memberships of all
data to K clusters
4 Update the center position of each group
5 Increase β carefully 6 If halting condition holds exit
7 Go to step 3
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1676
Exercise
Draw a while-loop flow chart to illustrate
how the annealed K-means algorithm
partition high-dimensional data to K
clusters
Implement the flow chart using Matlab
codes
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1776
Set β sufficiently small
Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1876
Minimize the weight
distance
t
i
t
i
i
t
i
i
i
t
iii
t v
t t v
t t v
dy
dE
t t v
][
][][
0)][(][
0
][][E
i
2
x
y
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1976
k
k
k
k
k
k
k k
k k
d C
d C
v
d C vd v
)exp(1
1)exp(
1
)exp()exp(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2076
j
j
k
k d
d v
)exp(
)exp()(
d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2176
Overlapping membership
Sufficiently small β
vk equals 1K
Sufficiently large β
vk = 1 if dk = mini di
= 0 otherwise
All vk are determined by the winner-take-all
principle
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2276
Why annealing
Sufficiently large β leads to the K-means
algorithm
Can annealing improve the performance
How to measure the performance for
clustering
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2376
center nearestthe][eachof distancetheupsums
10][][
][][
][][
EE
][][E
i
2
2
2
tot E
t vt if
t t v
t t v
t t v
i
t i
ii
i t
ii
i
i
t
iii
x
yx
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2476
Criterion of K-means
Minimal distance to
the nearest center
t i
ii t t 2][][E(y) yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2576
Cross distance
Refer to Math Programming
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2676
Self-organization
Clustering
Define neighboring relations of clusters
on a 2D lattice
Sorting high-dimensional data on a
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2776
Clustering by SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2876
SOM
Download Matlab Neural Networks
toolbox
Add directory nnet with sub-directories
to matlab path
Apply the SOM method to clustering
analysis of data in data_25mat
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2976
Load data
load data_25
plot(XX(1)XX(2)g)
hold on
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3076
SOM
net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100
Initialization
net = train(netXX)
plotsom(netiw11netlayers1distances)
Training
Display
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3176
net = newsom([0 02 0 01][5 5]gridtop)
5x5 lattice
2x2 matrix for two-dimensional inputs
dx2 matrix for d-dimensional inputs
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3276
net = train(netXX)
dxN data matrix
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3376
Surface construction
demo_fr2_somm
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3476
Surface reconstruction
P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]
net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)
hold onplotsom(netiw11netlayers1distances)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3576
Self-organization
Maximal fitting and minimal wiring
principle
Sort high dimensional data on a 2D
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3676
Neural optimization
K-means
Minimal distance to the nearest center
Kohonenrsquos self -organizing
Minimal wiring distance between
neighboring centers
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3776
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
C1 i=1j=1
C2 i=1j=M
C3 i=Mj=1
C4 i=Mj=M
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3876
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
S1 i=1j=2M-1
S2 i=2M-1j=M
S3 i=Mj=2M-1
S4 i=2M-1j=1
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3976
If (ij) not within S1-S4 and C1-C4
NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)
NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4076
Wiring Criterion
j M i jih
y yk k NB ji k jih
)1()(
W2
)()( )(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4176
Kohonenrsquos self -organizing algorithm
Self-organizing map - Wikipedia the free
encyclopedia
Neural networks (1996)
Neural networks (2002)
Neurocomputing (2011)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4276
SOM matlab toolbox
SOM Toolbox
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4376
Neural Networks 15(3) 337-347 (2002)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4476
Neurocomputing 74(12-13) 2228-2240
(2011)
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4576
Minimal wiring and maximal fitting
criteria
Maximal fitting to a center Minimal distance to the nearest center
Wiring Criterion
t i
ii t t 2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W
2
)()(
)(
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4676
Minimal wiring and maximal fitting
criteria
t iii t t
2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W2
)()(
)(
WEF(y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4776
Set β sufficiently small Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4876
Minimize the weight distance
k
k NBh
k hk
t
k
i
k k
k NBh
k h
t
ik k
ySolve
y yt t v
dy
W E d
y yt t v
0)()][(][
0)(
W][][E
)(
)(
2
k
2
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4976
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5076
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5176
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5276
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5376
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5476
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5576
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5676
Microarray expressions of yeast genes
Pat_Brown_Lab_Home_Page
Exploring the Metabolic and Genetic Control of Gene
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1376
Annealed K-means
Use overlapping memberships
Overcome the local minimum problem
Relaxation under an annealing process
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1476
Overlapping memberships
Each x has an overlapping membership
to each cluster
The membership to the ith cluster is
proportional to exp(-βdi) where di denote
the distance between x and the ith center
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1576
Annealed K-means
1 Input unsupervised high dimensional dataand the number K of clusters
2 Set β to sufficiently small positive numberrandomly initialize K centers near the center
of given data3 Determine overlapping memberships of all
data to K clusters
4 Update the center position of each group
5 Increase β carefully 6 If halting condition holds exit
7 Go to step 3
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1676
Exercise
Draw a while-loop flow chart to illustrate
how the annealed K-means algorithm
partition high-dimensional data to K
clusters
Implement the flow chart using Matlab
codes
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1776
Set β sufficiently small
Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1876
Minimize the weight
distance
t
i
t
i
i
t
i
i
i
t
iii
t v
t t v
t t v
dy
dE
t t v
][
][][
0)][(][
0
][][E
i
2
x
y
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1976
k
k
k
k
k
k
k k
k k
d C
d C
v
d C vd v
)exp(1
1)exp(
1
)exp()exp(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2076
j
j
k
k d
d v
)exp(
)exp()(
d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2176
Overlapping membership
Sufficiently small β
vk equals 1K
Sufficiently large β
vk = 1 if dk = mini di
= 0 otherwise
All vk are determined by the winner-take-all
principle
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2276
Why annealing
Sufficiently large β leads to the K-means
algorithm
Can annealing improve the performance
How to measure the performance for
clustering
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2376
center nearestthe][eachof distancetheupsums
10][][
][][
][][
EE
][][E
i
2
2
2
tot E
t vt if
t t v
t t v
t t v
i
t i
ii
i t
ii
i
i
t
iii
x
yx
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2476
Criterion of K-means
Minimal distance to
the nearest center
t i
ii t t 2][][E(y) yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2576
Cross distance
Refer to Math Programming
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2676
Self-organization
Clustering
Define neighboring relations of clusters
on a 2D lattice
Sorting high-dimensional data on a
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2776
Clustering by SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2876
SOM
Download Matlab Neural Networks
toolbox
Add directory nnet with sub-directories
to matlab path
Apply the SOM method to clustering
analysis of data in data_25mat
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2976
Load data
load data_25
plot(XX(1)XX(2)g)
hold on
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3076
SOM
net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100
Initialization
net = train(netXX)
plotsom(netiw11netlayers1distances)
Training
Display
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3176
net = newsom([0 02 0 01][5 5]gridtop)
5x5 lattice
2x2 matrix for two-dimensional inputs
dx2 matrix for d-dimensional inputs
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3276
net = train(netXX)
dxN data matrix
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3376
Surface construction
demo_fr2_somm
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3476
Surface reconstruction
P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]
net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)
hold onplotsom(netiw11netlayers1distances)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3576
Self-organization
Maximal fitting and minimal wiring
principle
Sort high dimensional data on a 2D
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3676
Neural optimization
K-means
Minimal distance to the nearest center
Kohonenrsquos self -organizing
Minimal wiring distance between
neighboring centers
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3776
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
C1 i=1j=1
C2 i=1j=M
C3 i=Mj=1
C4 i=Mj=M
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3876
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
S1 i=1j=2M-1
S2 i=2M-1j=M
S3 i=Mj=2M-1
S4 i=2M-1j=1
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3976
If (ij) not within S1-S4 and C1-C4
NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)
NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4076
Wiring Criterion
j M i jih
y yk k NB ji k jih
)1()(
W2
)()( )(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4176
Kohonenrsquos self -organizing algorithm
Self-organizing map - Wikipedia the free
encyclopedia
Neural networks (1996)
Neural networks (2002)
Neurocomputing (2011)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4276
SOM matlab toolbox
SOM Toolbox
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4376
Neural Networks 15(3) 337-347 (2002)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4476
Neurocomputing 74(12-13) 2228-2240
(2011)
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4576
Minimal wiring and maximal fitting
criteria
Maximal fitting to a center Minimal distance to the nearest center
Wiring Criterion
t i
ii t t 2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W
2
)()(
)(
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4676
Minimal wiring and maximal fitting
criteria
t iii t t
2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W2
)()(
)(
WEF(y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4776
Set β sufficiently small Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4876
Minimize the weight distance
k
k NBh
k hk
t
k
i
k k
k NBh
k h
t
ik k
ySolve
y yt t v
dy
W E d
y yt t v
0)()][(][
0)(
W][][E
)(
)(
2
k
2
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4976
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5076
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5176
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5276
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5376
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5476
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5576
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5676
Microarray expressions of yeast genes
Pat_Brown_Lab_Home_Page
Exploring the Metabolic and Genetic Control of Gene
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1476
Overlapping memberships
Each x has an overlapping membership
to each cluster
The membership to the ith cluster is
proportional to exp(-βdi) where di denote
the distance between x and the ith center
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1576
Annealed K-means
1 Input unsupervised high dimensional dataand the number K of clusters
2 Set β to sufficiently small positive numberrandomly initialize K centers near the center
of given data3 Determine overlapping memberships of all
data to K clusters
4 Update the center position of each group
5 Increase β carefully 6 If halting condition holds exit
7 Go to step 3
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1676
Exercise
Draw a while-loop flow chart to illustrate
how the annealed K-means algorithm
partition high-dimensional data to K
clusters
Implement the flow chart using Matlab
codes
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1776
Set β sufficiently small
Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1876
Minimize the weight
distance
t
i
t
i
i
t
i
i
i
t
iii
t v
t t v
t t v
dy
dE
t t v
][
][][
0)][(][
0
][][E
i
2
x
y
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1976
k
k
k
k
k
k
k k
k k
d C
d C
v
d C vd v
)exp(1
1)exp(
1
)exp()exp(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2076
j
j
k
k d
d v
)exp(
)exp()(
d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2176
Overlapping membership
Sufficiently small β
vk equals 1K
Sufficiently large β
vk = 1 if dk = mini di
= 0 otherwise
All vk are determined by the winner-take-all
principle
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2276
Why annealing
Sufficiently large β leads to the K-means
algorithm
Can annealing improve the performance
How to measure the performance for
clustering
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2376
center nearestthe][eachof distancetheupsums
10][][
][][
][][
EE
][][E
i
2
2
2
tot E
t vt if
t t v
t t v
t t v
i
t i
ii
i t
ii
i
i
t
iii
x
yx
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2476
Criterion of K-means
Minimal distance to
the nearest center
t i
ii t t 2][][E(y) yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2576
Cross distance
Refer to Math Programming
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2676
Self-organization
Clustering
Define neighboring relations of clusters
on a 2D lattice
Sorting high-dimensional data on a
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2776
Clustering by SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2876
SOM
Download Matlab Neural Networks
toolbox
Add directory nnet with sub-directories
to matlab path
Apply the SOM method to clustering
analysis of data in data_25mat
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2976
Load data
load data_25
plot(XX(1)XX(2)g)
hold on
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3076
SOM
net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100
Initialization
net = train(netXX)
plotsom(netiw11netlayers1distances)
Training
Display
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3176
net = newsom([0 02 0 01][5 5]gridtop)
5x5 lattice
2x2 matrix for two-dimensional inputs
dx2 matrix for d-dimensional inputs
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3276
net = train(netXX)
dxN data matrix
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3376
Surface construction
demo_fr2_somm
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3476
Surface reconstruction
P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]
net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)
hold onplotsom(netiw11netlayers1distances)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3576
Self-organization
Maximal fitting and minimal wiring
principle
Sort high dimensional data on a 2D
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3676
Neural optimization
K-means
Minimal distance to the nearest center
Kohonenrsquos self -organizing
Minimal wiring distance between
neighboring centers
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3776
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
C1 i=1j=1
C2 i=1j=M
C3 i=Mj=1
C4 i=Mj=M
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3876
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
S1 i=1j=2M-1
S2 i=2M-1j=M
S3 i=Mj=2M-1
S4 i=2M-1j=1
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3976
If (ij) not within S1-S4 and C1-C4
NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)
NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4076
Wiring Criterion
j M i jih
y yk k NB ji k jih
)1()(
W2
)()( )(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4176
Kohonenrsquos self -organizing algorithm
Self-organizing map - Wikipedia the free
encyclopedia
Neural networks (1996)
Neural networks (2002)
Neurocomputing (2011)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4276
SOM matlab toolbox
SOM Toolbox
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4376
Neural Networks 15(3) 337-347 (2002)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4476
Neurocomputing 74(12-13) 2228-2240
(2011)
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4576
Minimal wiring and maximal fitting
criteria
Maximal fitting to a center Minimal distance to the nearest center
Wiring Criterion
t i
ii t t 2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W
2
)()(
)(
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4676
Minimal wiring and maximal fitting
criteria
t iii t t
2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W2
)()(
)(
WEF(y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4776
Set β sufficiently small Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4876
Minimize the weight distance
k
k NBh
k hk
t
k
i
k k
k NBh
k h
t
ik k
ySolve
y yt t v
dy
W E d
y yt t v
0)()][(][
0)(
W][][E
)(
)(
2
k
2
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4976
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5076
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5176
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5276
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5376
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5476
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5576
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5676
Microarray expressions of yeast genes
Pat_Brown_Lab_Home_Page
Exploring the Metabolic and Genetic Control of Gene
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1576
Annealed K-means
1 Input unsupervised high dimensional dataand the number K of clusters
2 Set β to sufficiently small positive numberrandomly initialize K centers near the center
of given data3 Determine overlapping memberships of all
data to K clusters
4 Update the center position of each group
5 Increase β carefully 6 If halting condition holds exit
7 Go to step 3
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1676
Exercise
Draw a while-loop flow chart to illustrate
how the annealed K-means algorithm
partition high-dimensional data to K
clusters
Implement the flow chart using Matlab
codes
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1776
Set β sufficiently small
Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1876
Minimize the weight
distance
t
i
t
i
i
t
i
i
i
t
iii
t v
t t v
t t v
dy
dE
t t v
][
][][
0)][(][
0
][][E
i
2
x
y
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1976
k
k
k
k
k
k
k k
k k
d C
d C
v
d C vd v
)exp(1
1)exp(
1
)exp()exp(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2076
j
j
k
k d
d v
)exp(
)exp()(
d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2176
Overlapping membership
Sufficiently small β
vk equals 1K
Sufficiently large β
vk = 1 if dk = mini di
= 0 otherwise
All vk are determined by the winner-take-all
principle
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2276
Why annealing
Sufficiently large β leads to the K-means
algorithm
Can annealing improve the performance
How to measure the performance for
clustering
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2376
center nearestthe][eachof distancetheupsums
10][][
][][
][][
EE
][][E
i
2
2
2
tot E
t vt if
t t v
t t v
t t v
i
t i
ii
i t
ii
i
i
t
iii
x
yx
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2476
Criterion of K-means
Minimal distance to
the nearest center
t i
ii t t 2][][E(y) yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2576
Cross distance
Refer to Math Programming
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2676
Self-organization
Clustering
Define neighboring relations of clusters
on a 2D lattice
Sorting high-dimensional data on a
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2776
Clustering by SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2876
SOM
Download Matlab Neural Networks
toolbox
Add directory nnet with sub-directories
to matlab path
Apply the SOM method to clustering
analysis of data in data_25mat
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2976
Load data
load data_25
plot(XX(1)XX(2)g)
hold on
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3076
SOM
net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100
Initialization
net = train(netXX)
plotsom(netiw11netlayers1distances)
Training
Display
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3176
net = newsom([0 02 0 01][5 5]gridtop)
5x5 lattice
2x2 matrix for two-dimensional inputs
dx2 matrix for d-dimensional inputs
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3276
net = train(netXX)
dxN data matrix
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3376
Surface construction
demo_fr2_somm
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3476
Surface reconstruction
P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]
net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)
hold onplotsom(netiw11netlayers1distances)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3576
Self-organization
Maximal fitting and minimal wiring
principle
Sort high dimensional data on a 2D
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3676
Neural optimization
K-means
Minimal distance to the nearest center
Kohonenrsquos self -organizing
Minimal wiring distance between
neighboring centers
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3776
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
C1 i=1j=1
C2 i=1j=M
C3 i=Mj=1
C4 i=Mj=M
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3876
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
S1 i=1j=2M-1
S2 i=2M-1j=M
S3 i=Mj=2M-1
S4 i=2M-1j=1
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3976
If (ij) not within S1-S4 and C1-C4
NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)
NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4076
Wiring Criterion
j M i jih
y yk k NB ji k jih
)1()(
W2
)()( )(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4176
Kohonenrsquos self -organizing algorithm
Self-organizing map - Wikipedia the free
encyclopedia
Neural networks (1996)
Neural networks (2002)
Neurocomputing (2011)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4276
SOM matlab toolbox
SOM Toolbox
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4376
Neural Networks 15(3) 337-347 (2002)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4476
Neurocomputing 74(12-13) 2228-2240
(2011)
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4576
Minimal wiring and maximal fitting
criteria
Maximal fitting to a center Minimal distance to the nearest center
Wiring Criterion
t i
ii t t 2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W
2
)()(
)(
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4676
Minimal wiring and maximal fitting
criteria
t iii t t
2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W2
)()(
)(
WEF(y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4776
Set β sufficiently small Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4876
Minimize the weight distance
k
k NBh
k hk
t
k
i
k k
k NBh
k h
t
ik k
ySolve
y yt t v
dy
W E d
y yt t v
0)()][(][
0)(
W][][E
)(
)(
2
k
2
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4976
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5076
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5176
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5276
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5376
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5476
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5576
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5676
Microarray expressions of yeast genes
Pat_Brown_Lab_Home_Page
Exploring the Metabolic and Genetic Control of Gene
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1676
Exercise
Draw a while-loop flow chart to illustrate
how the annealed K-means algorithm
partition high-dimensional data to K
clusters
Implement the flow chart using Matlab
codes
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1776
Set β sufficiently small
Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1876
Minimize the weight
distance
t
i
t
i
i
t
i
i
i
t
iii
t v
t t v
t t v
dy
dE
t t v
][
][][
0)][(][
0
][][E
i
2
x
y
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1976
k
k
k
k
k
k
k k
k k
d C
d C
v
d C vd v
)exp(1
1)exp(
1
)exp()exp(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2076
j
j
k
k d
d v
)exp(
)exp()(
d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2176
Overlapping membership
Sufficiently small β
vk equals 1K
Sufficiently large β
vk = 1 if dk = mini di
= 0 otherwise
All vk are determined by the winner-take-all
principle
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2276
Why annealing
Sufficiently large β leads to the K-means
algorithm
Can annealing improve the performance
How to measure the performance for
clustering
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2376
center nearestthe][eachof distancetheupsums
10][][
][][
][][
EE
][][E
i
2
2
2
tot E
t vt if
t t v
t t v
t t v
i
t i
ii
i t
ii
i
i
t
iii
x
yx
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2476
Criterion of K-means
Minimal distance to
the nearest center
t i
ii t t 2][][E(y) yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2576
Cross distance
Refer to Math Programming
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2676
Self-organization
Clustering
Define neighboring relations of clusters
on a 2D lattice
Sorting high-dimensional data on a
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2776
Clustering by SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2876
SOM
Download Matlab Neural Networks
toolbox
Add directory nnet with sub-directories
to matlab path
Apply the SOM method to clustering
analysis of data in data_25mat
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2976
Load data
load data_25
plot(XX(1)XX(2)g)
hold on
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3076
SOM
net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100
Initialization
net = train(netXX)
plotsom(netiw11netlayers1distances)
Training
Display
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3176
net = newsom([0 02 0 01][5 5]gridtop)
5x5 lattice
2x2 matrix for two-dimensional inputs
dx2 matrix for d-dimensional inputs
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3276
net = train(netXX)
dxN data matrix
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3376
Surface construction
demo_fr2_somm
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3476
Surface reconstruction
P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]
net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)
hold onplotsom(netiw11netlayers1distances)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3576
Self-organization
Maximal fitting and minimal wiring
principle
Sort high dimensional data on a 2D
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3676
Neural optimization
K-means
Minimal distance to the nearest center
Kohonenrsquos self -organizing
Minimal wiring distance between
neighboring centers
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3776
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
C1 i=1j=1
C2 i=1j=M
C3 i=Mj=1
C4 i=Mj=M
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3876
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
S1 i=1j=2M-1
S2 i=2M-1j=M
S3 i=Mj=2M-1
S4 i=2M-1j=1
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3976
If (ij) not within S1-S4 and C1-C4
NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)
NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4076
Wiring Criterion
j M i jih
y yk k NB ji k jih
)1()(
W2
)()( )(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4176
Kohonenrsquos self -organizing algorithm
Self-organizing map - Wikipedia the free
encyclopedia
Neural networks (1996)
Neural networks (2002)
Neurocomputing (2011)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4276
SOM matlab toolbox
SOM Toolbox
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4376
Neural Networks 15(3) 337-347 (2002)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4476
Neurocomputing 74(12-13) 2228-2240
(2011)
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4576
Minimal wiring and maximal fitting
criteria
Maximal fitting to a center Minimal distance to the nearest center
Wiring Criterion
t i
ii t t 2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W
2
)()(
)(
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4676
Minimal wiring and maximal fitting
criteria
t iii t t
2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W2
)()(
)(
WEF(y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4776
Set β sufficiently small Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4876
Minimize the weight distance
k
k NBh
k hk
t
k
i
k k
k NBh
k h
t
ik k
ySolve
y yt t v
dy
W E d
y yt t v
0)()][(][
0)(
W][][E
)(
)(
2
k
2
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4976
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5076
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5176
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5276
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5376
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5476
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5576
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5676
Microarray expressions of yeast genes
Pat_Brown_Lab_Home_Page
Exploring the Metabolic and Genetic Control of Gene
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1776
Set β sufficiently small
Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1876
Minimize the weight
distance
t
i
t
i
i
t
i
i
i
t
iii
t v
t t v
t t v
dy
dE
t t v
][
][][
0)][(][
0
][][E
i
2
x
y
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1976
k
k
k
k
k
k
k k
k k
d C
d C
v
d C vd v
)exp(1
1)exp(
1
)exp()exp(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2076
j
j
k
k d
d v
)exp(
)exp()(
d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2176
Overlapping membership
Sufficiently small β
vk equals 1K
Sufficiently large β
vk = 1 if dk = mini di
= 0 otherwise
All vk are determined by the winner-take-all
principle
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2276
Why annealing
Sufficiently large β leads to the K-means
algorithm
Can annealing improve the performance
How to measure the performance for
clustering
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2376
center nearestthe][eachof distancetheupsums
10][][
][][
][][
EE
][][E
i
2
2
2
tot E
t vt if
t t v
t t v
t t v
i
t i
ii
i t
ii
i
i
t
iii
x
yx
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2476
Criterion of K-means
Minimal distance to
the nearest center
t i
ii t t 2][][E(y) yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2576
Cross distance
Refer to Math Programming
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2676
Self-organization
Clustering
Define neighboring relations of clusters
on a 2D lattice
Sorting high-dimensional data on a
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2776
Clustering by SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2876
SOM
Download Matlab Neural Networks
toolbox
Add directory nnet with sub-directories
to matlab path
Apply the SOM method to clustering
analysis of data in data_25mat
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2976
Load data
load data_25
plot(XX(1)XX(2)g)
hold on
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3076
SOM
net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100
Initialization
net = train(netXX)
plotsom(netiw11netlayers1distances)
Training
Display
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3176
net = newsom([0 02 0 01][5 5]gridtop)
5x5 lattice
2x2 matrix for two-dimensional inputs
dx2 matrix for d-dimensional inputs
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3276
net = train(netXX)
dxN data matrix
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3376
Surface construction
demo_fr2_somm
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3476
Surface reconstruction
P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]
net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)
hold onplotsom(netiw11netlayers1distances)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3576
Self-organization
Maximal fitting and minimal wiring
principle
Sort high dimensional data on a 2D
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3676
Neural optimization
K-means
Minimal distance to the nearest center
Kohonenrsquos self -organizing
Minimal wiring distance between
neighboring centers
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3776
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
C1 i=1j=1
C2 i=1j=M
C3 i=Mj=1
C4 i=Mj=M
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3876
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
S1 i=1j=2M-1
S2 i=2M-1j=M
S3 i=Mj=2M-1
S4 i=2M-1j=1
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3976
If (ij) not within S1-S4 and C1-C4
NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)
NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4076
Wiring Criterion
j M i jih
y yk k NB ji k jih
)1()(
W2
)()( )(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4176
Kohonenrsquos self -organizing algorithm
Self-organizing map - Wikipedia the free
encyclopedia
Neural networks (1996)
Neural networks (2002)
Neurocomputing (2011)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4276
SOM matlab toolbox
SOM Toolbox
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4376
Neural Networks 15(3) 337-347 (2002)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4476
Neurocomputing 74(12-13) 2228-2240
(2011)
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4576
Minimal wiring and maximal fitting
criteria
Maximal fitting to a center Minimal distance to the nearest center
Wiring Criterion
t i
ii t t 2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W
2
)()(
)(
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4676
Minimal wiring and maximal fitting
criteria
t iii t t
2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W2
)()(
)(
WEF(y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4776
Set β sufficiently small Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4876
Minimize the weight distance
k
k NBh
k hk
t
k
i
k k
k NBh
k h
t
ik k
ySolve
y yt t v
dy
W E d
y yt t v
0)()][(][
0)(
W][][E
)(
)(
2
k
2
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4976
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5076
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5176
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5276
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5376
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5476
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5576
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5676
Microarray expressions of yeast genes
Pat_Brown_Lab_Home_Page
Exploring the Metabolic and Genetic Control of Gene
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1876
Minimize the weight
distance
t
i
t
i
i
t
i
i
i
t
iii
t v
t t v
t t v
dy
dE
t t v
][
][][
0)][(][
0
][][E
i
2
x
y
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1976
k
k
k
k
k
k
k k
k k
d C
d C
v
d C vd v
)exp(1
1)exp(
1
)exp()exp(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2076
j
j
k
k d
d v
)exp(
)exp()(
d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2176
Overlapping membership
Sufficiently small β
vk equals 1K
Sufficiently large β
vk = 1 if dk = mini di
= 0 otherwise
All vk are determined by the winner-take-all
principle
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2276
Why annealing
Sufficiently large β leads to the K-means
algorithm
Can annealing improve the performance
How to measure the performance for
clustering
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2376
center nearestthe][eachof distancetheupsums
10][][
][][
][][
EE
][][E
i
2
2
2
tot E
t vt if
t t v
t t v
t t v
i
t i
ii
i t
ii
i
i
t
iii
x
yx
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2476
Criterion of K-means
Minimal distance to
the nearest center
t i
ii t t 2][][E(y) yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2576
Cross distance
Refer to Math Programming
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2676
Self-organization
Clustering
Define neighboring relations of clusters
on a 2D lattice
Sorting high-dimensional data on a
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2776
Clustering by SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2876
SOM
Download Matlab Neural Networks
toolbox
Add directory nnet with sub-directories
to matlab path
Apply the SOM method to clustering
analysis of data in data_25mat
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2976
Load data
load data_25
plot(XX(1)XX(2)g)
hold on
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3076
SOM
net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100
Initialization
net = train(netXX)
plotsom(netiw11netlayers1distances)
Training
Display
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3176
net = newsom([0 02 0 01][5 5]gridtop)
5x5 lattice
2x2 matrix for two-dimensional inputs
dx2 matrix for d-dimensional inputs
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3276
net = train(netXX)
dxN data matrix
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3376
Surface construction
demo_fr2_somm
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3476
Surface reconstruction
P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]
net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)
hold onplotsom(netiw11netlayers1distances)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3576
Self-organization
Maximal fitting and minimal wiring
principle
Sort high dimensional data on a 2D
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3676
Neural optimization
K-means
Minimal distance to the nearest center
Kohonenrsquos self -organizing
Minimal wiring distance between
neighboring centers
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3776
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
C1 i=1j=1
C2 i=1j=M
C3 i=Mj=1
C4 i=Mj=M
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3876
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
S1 i=1j=2M-1
S2 i=2M-1j=M
S3 i=Mj=2M-1
S4 i=2M-1j=1
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3976
If (ij) not within S1-S4 and C1-C4
NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)
NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4076
Wiring Criterion
j M i jih
y yk k NB ji k jih
)1()(
W2
)()( )(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4176
Kohonenrsquos self -organizing algorithm
Self-organizing map - Wikipedia the free
encyclopedia
Neural networks (1996)
Neural networks (2002)
Neurocomputing (2011)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4276
SOM matlab toolbox
SOM Toolbox
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4376
Neural Networks 15(3) 337-347 (2002)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4476
Neurocomputing 74(12-13) 2228-2240
(2011)
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4576
Minimal wiring and maximal fitting
criteria
Maximal fitting to a center Minimal distance to the nearest center
Wiring Criterion
t i
ii t t 2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W
2
)()(
)(
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4676
Minimal wiring and maximal fitting
criteria
t iii t t
2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W2
)()(
)(
WEF(y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4776
Set β sufficiently small Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4876
Minimize the weight distance
k
k NBh
k hk
t
k
i
k k
k NBh
k h
t
ik k
ySolve
y yt t v
dy
W E d
y yt t v
0)()][(][
0)(
W][][E
)(
)(
2
k
2
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4976
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5076
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5176
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5276
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5376
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5476
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5576
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5676
Microarray expressions of yeast genes
Pat_Brown_Lab_Home_Page
Exploring the Metabolic and Genetic Control of Gene
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 1976
k
k
k
k
k
k
k k
k k
d C
d C
v
d C vd v
)exp(1
1)exp(
1
)exp()exp(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2076
j
j
k
k d
d v
)exp(
)exp()(
d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2176
Overlapping membership
Sufficiently small β
vk equals 1K
Sufficiently large β
vk = 1 if dk = mini di
= 0 otherwise
All vk are determined by the winner-take-all
principle
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2276
Why annealing
Sufficiently large β leads to the K-means
algorithm
Can annealing improve the performance
How to measure the performance for
clustering
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2376
center nearestthe][eachof distancetheupsums
10][][
][][
][][
EE
][][E
i
2
2
2
tot E
t vt if
t t v
t t v
t t v
i
t i
ii
i t
ii
i
i
t
iii
x
yx
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2476
Criterion of K-means
Minimal distance to
the nearest center
t i
ii t t 2][][E(y) yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2576
Cross distance
Refer to Math Programming
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2676
Self-organization
Clustering
Define neighboring relations of clusters
on a 2D lattice
Sorting high-dimensional data on a
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2776
Clustering by SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2876
SOM
Download Matlab Neural Networks
toolbox
Add directory nnet with sub-directories
to matlab path
Apply the SOM method to clustering
analysis of data in data_25mat
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2976
Load data
load data_25
plot(XX(1)XX(2)g)
hold on
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3076
SOM
net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100
Initialization
net = train(netXX)
plotsom(netiw11netlayers1distances)
Training
Display
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3176
net = newsom([0 02 0 01][5 5]gridtop)
5x5 lattice
2x2 matrix for two-dimensional inputs
dx2 matrix for d-dimensional inputs
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3276
net = train(netXX)
dxN data matrix
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3376
Surface construction
demo_fr2_somm
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3476
Surface reconstruction
P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]
net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)
hold onplotsom(netiw11netlayers1distances)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3576
Self-organization
Maximal fitting and minimal wiring
principle
Sort high dimensional data on a 2D
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3676
Neural optimization
K-means
Minimal distance to the nearest center
Kohonenrsquos self -organizing
Minimal wiring distance between
neighboring centers
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3776
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
C1 i=1j=1
C2 i=1j=M
C3 i=Mj=1
C4 i=Mj=M
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3876
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
S1 i=1j=2M-1
S2 i=2M-1j=M
S3 i=Mj=2M-1
S4 i=2M-1j=1
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3976
If (ij) not within S1-S4 and C1-C4
NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)
NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4076
Wiring Criterion
j M i jih
y yk k NB ji k jih
)1()(
W2
)()( )(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4176
Kohonenrsquos self -organizing algorithm
Self-organizing map - Wikipedia the free
encyclopedia
Neural networks (1996)
Neural networks (2002)
Neurocomputing (2011)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4276
SOM matlab toolbox
SOM Toolbox
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4376
Neural Networks 15(3) 337-347 (2002)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4476
Neurocomputing 74(12-13) 2228-2240
(2011)
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4576
Minimal wiring and maximal fitting
criteria
Maximal fitting to a center Minimal distance to the nearest center
Wiring Criterion
t i
ii t t 2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W
2
)()(
)(
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4676
Minimal wiring and maximal fitting
criteria
t iii t t
2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W2
)()(
)(
WEF(y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4776
Set β sufficiently small Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4876
Minimize the weight distance
k
k NBh
k hk
t
k
i
k k
k NBh
k h
t
ik k
ySolve
y yt t v
dy
W E d
y yt t v
0)()][(][
0)(
W][][E
)(
)(
2
k
2
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4976
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5076
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5176
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5276
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5376
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5476
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5576
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5676
Microarray expressions of yeast genes
Pat_Brown_Lab_Home_Page
Exploring the Metabolic and Genetic Control of Gene
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2076
j
j
k
k d
d v
)exp(
)exp()(
d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2176
Overlapping membership
Sufficiently small β
vk equals 1K
Sufficiently large β
vk = 1 if dk = mini di
= 0 otherwise
All vk are determined by the winner-take-all
principle
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2276
Why annealing
Sufficiently large β leads to the K-means
algorithm
Can annealing improve the performance
How to measure the performance for
clustering
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2376
center nearestthe][eachof distancetheupsums
10][][
][][
][][
EE
][][E
i
2
2
2
tot E
t vt if
t t v
t t v
t t v
i
t i
ii
i t
ii
i
i
t
iii
x
yx
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2476
Criterion of K-means
Minimal distance to
the nearest center
t i
ii t t 2][][E(y) yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2576
Cross distance
Refer to Math Programming
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2676
Self-organization
Clustering
Define neighboring relations of clusters
on a 2D lattice
Sorting high-dimensional data on a
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2776
Clustering by SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2876
SOM
Download Matlab Neural Networks
toolbox
Add directory nnet with sub-directories
to matlab path
Apply the SOM method to clustering
analysis of data in data_25mat
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2976
Load data
load data_25
plot(XX(1)XX(2)g)
hold on
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3076
SOM
net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100
Initialization
net = train(netXX)
plotsom(netiw11netlayers1distances)
Training
Display
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3176
net = newsom([0 02 0 01][5 5]gridtop)
5x5 lattice
2x2 matrix for two-dimensional inputs
dx2 matrix for d-dimensional inputs
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3276
net = train(netXX)
dxN data matrix
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3376
Surface construction
demo_fr2_somm
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3476
Surface reconstruction
P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]
net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)
hold onplotsom(netiw11netlayers1distances)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3576
Self-organization
Maximal fitting and minimal wiring
principle
Sort high dimensional data on a 2D
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3676
Neural optimization
K-means
Minimal distance to the nearest center
Kohonenrsquos self -organizing
Minimal wiring distance between
neighboring centers
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3776
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
C1 i=1j=1
C2 i=1j=M
C3 i=Mj=1
C4 i=Mj=M
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3876
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
S1 i=1j=2M-1
S2 i=2M-1j=M
S3 i=Mj=2M-1
S4 i=2M-1j=1
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3976
If (ij) not within S1-S4 and C1-C4
NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)
NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4076
Wiring Criterion
j M i jih
y yk k NB ji k jih
)1()(
W2
)()( )(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4176
Kohonenrsquos self -organizing algorithm
Self-organizing map - Wikipedia the free
encyclopedia
Neural networks (1996)
Neural networks (2002)
Neurocomputing (2011)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4276
SOM matlab toolbox
SOM Toolbox
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4376
Neural Networks 15(3) 337-347 (2002)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4476
Neurocomputing 74(12-13) 2228-2240
(2011)
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4576
Minimal wiring and maximal fitting
criteria
Maximal fitting to a center Minimal distance to the nearest center
Wiring Criterion
t i
ii t t 2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W
2
)()(
)(
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4676
Minimal wiring and maximal fitting
criteria
t iii t t
2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W2
)()(
)(
WEF(y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4776
Set β sufficiently small Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4876
Minimize the weight distance
k
k NBh
k hk
t
k
i
k k
k NBh
k h
t
ik k
ySolve
y yt t v
dy
W E d
y yt t v
0)()][(][
0)(
W][][E
)(
)(
2
k
2
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4976
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5076
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5176
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5276
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5376
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5476
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5576
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5676
Microarray expressions of yeast genes
Pat_Brown_Lab_Home_Page
Exploring the Metabolic and Genetic Control of Gene
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2176
Overlapping membership
Sufficiently small β
vk equals 1K
Sufficiently large β
vk = 1 if dk = mini di
= 0 otherwise
All vk are determined by the winner-take-all
principle
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2276
Why annealing
Sufficiently large β leads to the K-means
algorithm
Can annealing improve the performance
How to measure the performance for
clustering
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2376
center nearestthe][eachof distancetheupsums
10][][
][][
][][
EE
][][E
i
2
2
2
tot E
t vt if
t t v
t t v
t t v
i
t i
ii
i t
ii
i
i
t
iii
x
yx
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2476
Criterion of K-means
Minimal distance to
the nearest center
t i
ii t t 2][][E(y) yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2576
Cross distance
Refer to Math Programming
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2676
Self-organization
Clustering
Define neighboring relations of clusters
on a 2D lattice
Sorting high-dimensional data on a
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2776
Clustering by SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2876
SOM
Download Matlab Neural Networks
toolbox
Add directory nnet with sub-directories
to matlab path
Apply the SOM method to clustering
analysis of data in data_25mat
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2976
Load data
load data_25
plot(XX(1)XX(2)g)
hold on
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3076
SOM
net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100
Initialization
net = train(netXX)
plotsom(netiw11netlayers1distances)
Training
Display
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3176
net = newsom([0 02 0 01][5 5]gridtop)
5x5 lattice
2x2 matrix for two-dimensional inputs
dx2 matrix for d-dimensional inputs
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3276
net = train(netXX)
dxN data matrix
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3376
Surface construction
demo_fr2_somm
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3476
Surface reconstruction
P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]
net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)
hold onplotsom(netiw11netlayers1distances)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3576
Self-organization
Maximal fitting and minimal wiring
principle
Sort high dimensional data on a 2D
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3676
Neural optimization
K-means
Minimal distance to the nearest center
Kohonenrsquos self -organizing
Minimal wiring distance between
neighboring centers
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3776
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
C1 i=1j=1
C2 i=1j=M
C3 i=Mj=1
C4 i=Mj=M
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3876
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
S1 i=1j=2M-1
S2 i=2M-1j=M
S3 i=Mj=2M-1
S4 i=2M-1j=1
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3976
If (ij) not within S1-S4 and C1-C4
NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)
NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4076
Wiring Criterion
j M i jih
y yk k NB ji k jih
)1()(
W2
)()( )(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4176
Kohonenrsquos self -organizing algorithm
Self-organizing map - Wikipedia the free
encyclopedia
Neural networks (1996)
Neural networks (2002)
Neurocomputing (2011)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4276
SOM matlab toolbox
SOM Toolbox
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4376
Neural Networks 15(3) 337-347 (2002)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4476
Neurocomputing 74(12-13) 2228-2240
(2011)
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4576
Minimal wiring and maximal fitting
criteria
Maximal fitting to a center Minimal distance to the nearest center
Wiring Criterion
t i
ii t t 2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W
2
)()(
)(
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4676
Minimal wiring and maximal fitting
criteria
t iii t t
2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W2
)()(
)(
WEF(y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4776
Set β sufficiently small Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4876
Minimize the weight distance
k
k NBh
k hk
t
k
i
k k
k NBh
k h
t
ik k
ySolve
y yt t v
dy
W E d
y yt t v
0)()][(][
0)(
W][][E
)(
)(
2
k
2
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4976
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5076
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5176
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5276
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5376
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5476
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5576
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5676
Microarray expressions of yeast genes
Pat_Brown_Lab_Home_Page
Exploring the Metabolic and Genetic Control of Gene
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2276
Why annealing
Sufficiently large β leads to the K-means
algorithm
Can annealing improve the performance
How to measure the performance for
clustering
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2376
center nearestthe][eachof distancetheupsums
10][][
][][
][][
EE
][][E
i
2
2
2
tot E
t vt if
t t v
t t v
t t v
i
t i
ii
i t
ii
i
i
t
iii
x
yx
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2476
Criterion of K-means
Minimal distance to
the nearest center
t i
ii t t 2][][E(y) yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2576
Cross distance
Refer to Math Programming
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2676
Self-organization
Clustering
Define neighboring relations of clusters
on a 2D lattice
Sorting high-dimensional data on a
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2776
Clustering by SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2876
SOM
Download Matlab Neural Networks
toolbox
Add directory nnet with sub-directories
to matlab path
Apply the SOM method to clustering
analysis of data in data_25mat
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2976
Load data
load data_25
plot(XX(1)XX(2)g)
hold on
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3076
SOM
net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100
Initialization
net = train(netXX)
plotsom(netiw11netlayers1distances)
Training
Display
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3176
net = newsom([0 02 0 01][5 5]gridtop)
5x5 lattice
2x2 matrix for two-dimensional inputs
dx2 matrix for d-dimensional inputs
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3276
net = train(netXX)
dxN data matrix
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3376
Surface construction
demo_fr2_somm
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3476
Surface reconstruction
P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]
net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)
hold onplotsom(netiw11netlayers1distances)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3576
Self-organization
Maximal fitting and minimal wiring
principle
Sort high dimensional data on a 2D
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3676
Neural optimization
K-means
Minimal distance to the nearest center
Kohonenrsquos self -organizing
Minimal wiring distance between
neighboring centers
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3776
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
C1 i=1j=1
C2 i=1j=M
C3 i=Mj=1
C4 i=Mj=M
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3876
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
S1 i=1j=2M-1
S2 i=2M-1j=M
S3 i=Mj=2M-1
S4 i=2M-1j=1
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3976
If (ij) not within S1-S4 and C1-C4
NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)
NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4076
Wiring Criterion
j M i jih
y yk k NB ji k jih
)1()(
W2
)()( )(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4176
Kohonenrsquos self -organizing algorithm
Self-organizing map - Wikipedia the free
encyclopedia
Neural networks (1996)
Neural networks (2002)
Neurocomputing (2011)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4276
SOM matlab toolbox
SOM Toolbox
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4376
Neural Networks 15(3) 337-347 (2002)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4476
Neurocomputing 74(12-13) 2228-2240
(2011)
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4576
Minimal wiring and maximal fitting
criteria
Maximal fitting to a center Minimal distance to the nearest center
Wiring Criterion
t i
ii t t 2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W
2
)()(
)(
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4676
Minimal wiring and maximal fitting
criteria
t iii t t
2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W2
)()(
)(
WEF(y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4776
Set β sufficiently small Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4876
Minimize the weight distance
k
k NBh
k hk
t
k
i
k k
k NBh
k h
t
ik k
ySolve
y yt t v
dy
W E d
y yt t v
0)()][(][
0)(
W][][E
)(
)(
2
k
2
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4976
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5076
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5176
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5276
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5376
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5476
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5576
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5676
Microarray expressions of yeast genes
Pat_Brown_Lab_Home_Page
Exploring the Metabolic and Genetic Control of Gene
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2376
center nearestthe][eachof distancetheupsums
10][][
][][
][][
EE
][][E
i
2
2
2
tot E
t vt if
t t v
t t v
t t v
i
t i
ii
i t
ii
i
i
t
iii
x
yx
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2476
Criterion of K-means
Minimal distance to
the nearest center
t i
ii t t 2][][E(y) yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2576
Cross distance
Refer to Math Programming
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2676
Self-organization
Clustering
Define neighboring relations of clusters
on a 2D lattice
Sorting high-dimensional data on a
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2776
Clustering by SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2876
SOM
Download Matlab Neural Networks
toolbox
Add directory nnet with sub-directories
to matlab path
Apply the SOM method to clustering
analysis of data in data_25mat
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2976
Load data
load data_25
plot(XX(1)XX(2)g)
hold on
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3076
SOM
net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100
Initialization
net = train(netXX)
plotsom(netiw11netlayers1distances)
Training
Display
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3176
net = newsom([0 02 0 01][5 5]gridtop)
5x5 lattice
2x2 matrix for two-dimensional inputs
dx2 matrix for d-dimensional inputs
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3276
net = train(netXX)
dxN data matrix
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3376
Surface construction
demo_fr2_somm
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3476
Surface reconstruction
P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]
net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)
hold onplotsom(netiw11netlayers1distances)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3576
Self-organization
Maximal fitting and minimal wiring
principle
Sort high dimensional data on a 2D
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3676
Neural optimization
K-means
Minimal distance to the nearest center
Kohonenrsquos self -organizing
Minimal wiring distance between
neighboring centers
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3776
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
C1 i=1j=1
C2 i=1j=M
C3 i=Mj=1
C4 i=Mj=M
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3876
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
S1 i=1j=2M-1
S2 i=2M-1j=M
S3 i=Mj=2M-1
S4 i=2M-1j=1
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3976
If (ij) not within S1-S4 and C1-C4
NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)
NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4076
Wiring Criterion
j M i jih
y yk k NB ji k jih
)1()(
W2
)()( )(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4176
Kohonenrsquos self -organizing algorithm
Self-organizing map - Wikipedia the free
encyclopedia
Neural networks (1996)
Neural networks (2002)
Neurocomputing (2011)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4276
SOM matlab toolbox
SOM Toolbox
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4376
Neural Networks 15(3) 337-347 (2002)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4476
Neurocomputing 74(12-13) 2228-2240
(2011)
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4576
Minimal wiring and maximal fitting
criteria
Maximal fitting to a center Minimal distance to the nearest center
Wiring Criterion
t i
ii t t 2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W
2
)()(
)(
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4676
Minimal wiring and maximal fitting
criteria
t iii t t
2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W2
)()(
)(
WEF(y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4776
Set β sufficiently small Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4876
Minimize the weight distance
k
k NBh
k hk
t
k
i
k k
k NBh
k h
t
ik k
ySolve
y yt t v
dy
W E d
y yt t v
0)()][(][
0)(
W][][E
)(
)(
2
k
2
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4976
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5076
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5176
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5276
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5376
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5476
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5576
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5676
Microarray expressions of yeast genes
Pat_Brown_Lab_Home_Page
Exploring the Metabolic and Genetic Control of Gene
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2476
Criterion of K-means
Minimal distance to
the nearest center
t i
ii t t 2][][E(y) yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2576
Cross distance
Refer to Math Programming
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2676
Self-organization
Clustering
Define neighboring relations of clusters
on a 2D lattice
Sorting high-dimensional data on a
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2776
Clustering by SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2876
SOM
Download Matlab Neural Networks
toolbox
Add directory nnet with sub-directories
to matlab path
Apply the SOM method to clustering
analysis of data in data_25mat
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2976
Load data
load data_25
plot(XX(1)XX(2)g)
hold on
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3076
SOM
net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100
Initialization
net = train(netXX)
plotsom(netiw11netlayers1distances)
Training
Display
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3176
net = newsom([0 02 0 01][5 5]gridtop)
5x5 lattice
2x2 matrix for two-dimensional inputs
dx2 matrix for d-dimensional inputs
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3276
net = train(netXX)
dxN data matrix
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3376
Surface construction
demo_fr2_somm
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3476
Surface reconstruction
P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]
net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)
hold onplotsom(netiw11netlayers1distances)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3576
Self-organization
Maximal fitting and minimal wiring
principle
Sort high dimensional data on a 2D
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3676
Neural optimization
K-means
Minimal distance to the nearest center
Kohonenrsquos self -organizing
Minimal wiring distance between
neighboring centers
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3776
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
C1 i=1j=1
C2 i=1j=M
C3 i=Mj=1
C4 i=Mj=M
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3876
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
S1 i=1j=2M-1
S2 i=2M-1j=M
S3 i=Mj=2M-1
S4 i=2M-1j=1
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3976
If (ij) not within S1-S4 and C1-C4
NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)
NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4076
Wiring Criterion
j M i jih
y yk k NB ji k jih
)1()(
W2
)()( )(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4176
Kohonenrsquos self -organizing algorithm
Self-organizing map - Wikipedia the free
encyclopedia
Neural networks (1996)
Neural networks (2002)
Neurocomputing (2011)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4276
SOM matlab toolbox
SOM Toolbox
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4376
Neural Networks 15(3) 337-347 (2002)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4476
Neurocomputing 74(12-13) 2228-2240
(2011)
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4576
Minimal wiring and maximal fitting
criteria
Maximal fitting to a center Minimal distance to the nearest center
Wiring Criterion
t i
ii t t 2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W
2
)()(
)(
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4676
Minimal wiring and maximal fitting
criteria
t iii t t
2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W2
)()(
)(
WEF(y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4776
Set β sufficiently small Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4876
Minimize the weight distance
k
k NBh
k hk
t
k
i
k k
k NBh
k h
t
ik k
ySolve
y yt t v
dy
W E d
y yt t v
0)()][(][
0)(
W][][E
)(
)(
2
k
2
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4976
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5076
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5176
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5276
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5376
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5476
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5576
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5676
Microarray expressions of yeast genes
Pat_Brown_Lab_Home_Page
Exploring the Metabolic and Genetic Control of Gene
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2576
Cross distance
Refer to Math Programming
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2676
Self-organization
Clustering
Define neighboring relations of clusters
on a 2D lattice
Sorting high-dimensional data on a
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2776
Clustering by SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2876
SOM
Download Matlab Neural Networks
toolbox
Add directory nnet with sub-directories
to matlab path
Apply the SOM method to clustering
analysis of data in data_25mat
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2976
Load data
load data_25
plot(XX(1)XX(2)g)
hold on
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3076
SOM
net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100
Initialization
net = train(netXX)
plotsom(netiw11netlayers1distances)
Training
Display
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3176
net = newsom([0 02 0 01][5 5]gridtop)
5x5 lattice
2x2 matrix for two-dimensional inputs
dx2 matrix for d-dimensional inputs
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3276
net = train(netXX)
dxN data matrix
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3376
Surface construction
demo_fr2_somm
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3476
Surface reconstruction
P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]
net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)
hold onplotsom(netiw11netlayers1distances)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3576
Self-organization
Maximal fitting and minimal wiring
principle
Sort high dimensional data on a 2D
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3676
Neural optimization
K-means
Minimal distance to the nearest center
Kohonenrsquos self -organizing
Minimal wiring distance between
neighboring centers
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3776
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
C1 i=1j=1
C2 i=1j=M
C3 i=Mj=1
C4 i=Mj=M
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3876
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
S1 i=1j=2M-1
S2 i=2M-1j=M
S3 i=Mj=2M-1
S4 i=2M-1j=1
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3976
If (ij) not within S1-S4 and C1-C4
NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)
NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4076
Wiring Criterion
j M i jih
y yk k NB ji k jih
)1()(
W2
)()( )(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4176
Kohonenrsquos self -organizing algorithm
Self-organizing map - Wikipedia the free
encyclopedia
Neural networks (1996)
Neural networks (2002)
Neurocomputing (2011)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4276
SOM matlab toolbox
SOM Toolbox
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4376
Neural Networks 15(3) 337-347 (2002)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4476
Neurocomputing 74(12-13) 2228-2240
(2011)
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4576
Minimal wiring and maximal fitting
criteria
Maximal fitting to a center Minimal distance to the nearest center
Wiring Criterion
t i
ii t t 2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W
2
)()(
)(
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4676
Minimal wiring and maximal fitting
criteria
t iii t t
2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W2
)()(
)(
WEF(y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4776
Set β sufficiently small Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4876
Minimize the weight distance
k
k NBh
k hk
t
k
i
k k
k NBh
k h
t
ik k
ySolve
y yt t v
dy
W E d
y yt t v
0)()][(][
0)(
W][][E
)(
)(
2
k
2
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4976
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5076
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5176
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5276
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5376
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5476
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5576
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5676
Microarray expressions of yeast genes
Pat_Brown_Lab_Home_Page
Exploring the Metabolic and Genetic Control of Gene
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2676
Self-organization
Clustering
Define neighboring relations of clusters
on a 2D lattice
Sorting high-dimensional data on a
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2776
Clustering by SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2876
SOM
Download Matlab Neural Networks
toolbox
Add directory nnet with sub-directories
to matlab path
Apply the SOM method to clustering
analysis of data in data_25mat
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2976
Load data
load data_25
plot(XX(1)XX(2)g)
hold on
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3076
SOM
net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100
Initialization
net = train(netXX)
plotsom(netiw11netlayers1distances)
Training
Display
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3176
net = newsom([0 02 0 01][5 5]gridtop)
5x5 lattice
2x2 matrix for two-dimensional inputs
dx2 matrix for d-dimensional inputs
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3276
net = train(netXX)
dxN data matrix
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3376
Surface construction
demo_fr2_somm
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3476
Surface reconstruction
P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]
net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)
hold onplotsom(netiw11netlayers1distances)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3576
Self-organization
Maximal fitting and minimal wiring
principle
Sort high dimensional data on a 2D
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3676
Neural optimization
K-means
Minimal distance to the nearest center
Kohonenrsquos self -organizing
Minimal wiring distance between
neighboring centers
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3776
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
C1 i=1j=1
C2 i=1j=M
C3 i=Mj=1
C4 i=Mj=M
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3876
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
S1 i=1j=2M-1
S2 i=2M-1j=M
S3 i=Mj=2M-1
S4 i=2M-1j=1
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3976
If (ij) not within S1-S4 and C1-C4
NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)
NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4076
Wiring Criterion
j M i jih
y yk k NB ji k jih
)1()(
W2
)()( )(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4176
Kohonenrsquos self -organizing algorithm
Self-organizing map - Wikipedia the free
encyclopedia
Neural networks (1996)
Neural networks (2002)
Neurocomputing (2011)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4276
SOM matlab toolbox
SOM Toolbox
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4376
Neural Networks 15(3) 337-347 (2002)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4476
Neurocomputing 74(12-13) 2228-2240
(2011)
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4576
Minimal wiring and maximal fitting
criteria
Maximal fitting to a center Minimal distance to the nearest center
Wiring Criterion
t i
ii t t 2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W
2
)()(
)(
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4676
Minimal wiring and maximal fitting
criteria
t iii t t
2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W2
)()(
)(
WEF(y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4776
Set β sufficiently small Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4876
Minimize the weight distance
k
k NBh
k hk
t
k
i
k k
k NBh
k h
t
ik k
ySolve
y yt t v
dy
W E d
y yt t v
0)()][(][
0)(
W][][E
)(
)(
2
k
2
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4976
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5076
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5176
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5276
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5376
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5476
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5576
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5676
Microarray expressions of yeast genes
Pat_Brown_Lab_Home_Page
Exploring the Metabolic and Genetic Control of Gene
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2776
Clustering by SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2876
SOM
Download Matlab Neural Networks
toolbox
Add directory nnet with sub-directories
to matlab path
Apply the SOM method to clustering
analysis of data in data_25mat
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2976
Load data
load data_25
plot(XX(1)XX(2)g)
hold on
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3076
SOM
net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100
Initialization
net = train(netXX)
plotsom(netiw11netlayers1distances)
Training
Display
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3176
net = newsom([0 02 0 01][5 5]gridtop)
5x5 lattice
2x2 matrix for two-dimensional inputs
dx2 matrix for d-dimensional inputs
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3276
net = train(netXX)
dxN data matrix
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3376
Surface construction
demo_fr2_somm
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3476
Surface reconstruction
P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]
net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)
hold onplotsom(netiw11netlayers1distances)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3576
Self-organization
Maximal fitting and minimal wiring
principle
Sort high dimensional data on a 2D
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3676
Neural optimization
K-means
Minimal distance to the nearest center
Kohonenrsquos self -organizing
Minimal wiring distance between
neighboring centers
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3776
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
C1 i=1j=1
C2 i=1j=M
C3 i=Mj=1
C4 i=Mj=M
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3876
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
S1 i=1j=2M-1
S2 i=2M-1j=M
S3 i=Mj=2M-1
S4 i=2M-1j=1
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3976
If (ij) not within S1-S4 and C1-C4
NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)
NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4076
Wiring Criterion
j M i jih
y yk k NB ji k jih
)1()(
W2
)()( )(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4176
Kohonenrsquos self -organizing algorithm
Self-organizing map - Wikipedia the free
encyclopedia
Neural networks (1996)
Neural networks (2002)
Neurocomputing (2011)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4276
SOM matlab toolbox
SOM Toolbox
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4376
Neural Networks 15(3) 337-347 (2002)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4476
Neurocomputing 74(12-13) 2228-2240
(2011)
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4576
Minimal wiring and maximal fitting
criteria
Maximal fitting to a center Minimal distance to the nearest center
Wiring Criterion
t i
ii t t 2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W
2
)()(
)(
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4676
Minimal wiring and maximal fitting
criteria
t iii t t
2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W2
)()(
)(
WEF(y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4776
Set β sufficiently small Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4876
Minimize the weight distance
k
k NBh
k hk
t
k
i
k k
k NBh
k h
t
ik k
ySolve
y yt t v
dy
W E d
y yt t v
0)()][(][
0)(
W][][E
)(
)(
2
k
2
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4976
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5076
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5176
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5276
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5376
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5476
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5576
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5676
Microarray expressions of yeast genes
Pat_Brown_Lab_Home_Page
Exploring the Metabolic and Genetic Control of Gene
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2876
SOM
Download Matlab Neural Networks
toolbox
Add directory nnet with sub-directories
to matlab path
Apply the SOM method to clustering
analysis of data in data_25mat
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2976
Load data
load data_25
plot(XX(1)XX(2)g)
hold on
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3076
SOM
net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100
Initialization
net = train(netXX)
plotsom(netiw11netlayers1distances)
Training
Display
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3176
net = newsom([0 02 0 01][5 5]gridtop)
5x5 lattice
2x2 matrix for two-dimensional inputs
dx2 matrix for d-dimensional inputs
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3276
net = train(netXX)
dxN data matrix
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3376
Surface construction
demo_fr2_somm
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3476
Surface reconstruction
P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]
net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)
hold onplotsom(netiw11netlayers1distances)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3576
Self-organization
Maximal fitting and minimal wiring
principle
Sort high dimensional data on a 2D
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3676
Neural optimization
K-means
Minimal distance to the nearest center
Kohonenrsquos self -organizing
Minimal wiring distance between
neighboring centers
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3776
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
C1 i=1j=1
C2 i=1j=M
C3 i=Mj=1
C4 i=Mj=M
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3876
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
S1 i=1j=2M-1
S2 i=2M-1j=M
S3 i=Mj=2M-1
S4 i=2M-1j=1
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3976
If (ij) not within S1-S4 and C1-C4
NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)
NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4076
Wiring Criterion
j M i jih
y yk k NB ji k jih
)1()(
W2
)()( )(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4176
Kohonenrsquos self -organizing algorithm
Self-organizing map - Wikipedia the free
encyclopedia
Neural networks (1996)
Neural networks (2002)
Neurocomputing (2011)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4276
SOM matlab toolbox
SOM Toolbox
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4376
Neural Networks 15(3) 337-347 (2002)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4476
Neurocomputing 74(12-13) 2228-2240
(2011)
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4576
Minimal wiring and maximal fitting
criteria
Maximal fitting to a center Minimal distance to the nearest center
Wiring Criterion
t i
ii t t 2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W
2
)()(
)(
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4676
Minimal wiring and maximal fitting
criteria
t iii t t
2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W2
)()(
)(
WEF(y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4776
Set β sufficiently small Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4876
Minimize the weight distance
k
k NBh
k hk
t
k
i
k k
k NBh
k h
t
ik k
ySolve
y yt t v
dy
W E d
y yt t v
0)()][(][
0)(
W][][E
)(
)(
2
k
2
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4976
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5076
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5176
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5276
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5376
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5476
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5576
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5676
Microarray expressions of yeast genes
Pat_Brown_Lab_Home_Page
Exploring the Metabolic and Genetic Control of Gene
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 2976
Load data
load data_25
plot(XX(1)XX(2)g)
hold on
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3076
SOM
net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100
Initialization
net = train(netXX)
plotsom(netiw11netlayers1distances)
Training
Display
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3176
net = newsom([0 02 0 01][5 5]gridtop)
5x5 lattice
2x2 matrix for two-dimensional inputs
dx2 matrix for d-dimensional inputs
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3276
net = train(netXX)
dxN data matrix
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3376
Surface construction
demo_fr2_somm
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3476
Surface reconstruction
P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]
net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)
hold onplotsom(netiw11netlayers1distances)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3576
Self-organization
Maximal fitting and minimal wiring
principle
Sort high dimensional data on a 2D
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3676
Neural optimization
K-means
Minimal distance to the nearest center
Kohonenrsquos self -organizing
Minimal wiring distance between
neighboring centers
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3776
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
C1 i=1j=1
C2 i=1j=M
C3 i=Mj=1
C4 i=Mj=M
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3876
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
S1 i=1j=2M-1
S2 i=2M-1j=M
S3 i=Mj=2M-1
S4 i=2M-1j=1
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3976
If (ij) not within S1-S4 and C1-C4
NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)
NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4076
Wiring Criterion
j M i jih
y yk k NB ji k jih
)1()(
W2
)()( )(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4176
Kohonenrsquos self -organizing algorithm
Self-organizing map - Wikipedia the free
encyclopedia
Neural networks (1996)
Neural networks (2002)
Neurocomputing (2011)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4276
SOM matlab toolbox
SOM Toolbox
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4376
Neural Networks 15(3) 337-347 (2002)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4476
Neurocomputing 74(12-13) 2228-2240
(2011)
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4576
Minimal wiring and maximal fitting
criteria
Maximal fitting to a center Minimal distance to the nearest center
Wiring Criterion
t i
ii t t 2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W
2
)()(
)(
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4676
Minimal wiring and maximal fitting
criteria
t iii t t
2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W2
)()(
)(
WEF(y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4776
Set β sufficiently small Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4876
Minimize the weight distance
k
k NBh
k hk
t
k
i
k k
k NBh
k h
t
ik k
ySolve
y yt t v
dy
W E d
y yt t v
0)()][(][
0)(
W][][E
)(
)(
2
k
2
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4976
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5076
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5176
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5276
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5376
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5476
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5576
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5676
Microarray expressions of yeast genes
Pat_Brown_Lab_Home_Page
Exploring the Metabolic and Genetic Control of Gene
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3076
SOM
net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100
Initialization
net = train(netXX)
plotsom(netiw11netlayers1distances)
Training
Display
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3176
net = newsom([0 02 0 01][5 5]gridtop)
5x5 lattice
2x2 matrix for two-dimensional inputs
dx2 matrix for d-dimensional inputs
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3276
net = train(netXX)
dxN data matrix
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3376
Surface construction
demo_fr2_somm
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3476
Surface reconstruction
P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]
net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)
hold onplotsom(netiw11netlayers1distances)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3576
Self-organization
Maximal fitting and minimal wiring
principle
Sort high dimensional data on a 2D
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3676
Neural optimization
K-means
Minimal distance to the nearest center
Kohonenrsquos self -organizing
Minimal wiring distance between
neighboring centers
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3776
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
C1 i=1j=1
C2 i=1j=M
C3 i=Mj=1
C4 i=Mj=M
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3876
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
S1 i=1j=2M-1
S2 i=2M-1j=M
S3 i=Mj=2M-1
S4 i=2M-1j=1
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3976
If (ij) not within S1-S4 and C1-C4
NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)
NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4076
Wiring Criterion
j M i jih
y yk k NB ji k jih
)1()(
W2
)()( )(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4176
Kohonenrsquos self -organizing algorithm
Self-organizing map - Wikipedia the free
encyclopedia
Neural networks (1996)
Neural networks (2002)
Neurocomputing (2011)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4276
SOM matlab toolbox
SOM Toolbox
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4376
Neural Networks 15(3) 337-347 (2002)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4476
Neurocomputing 74(12-13) 2228-2240
(2011)
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4576
Minimal wiring and maximal fitting
criteria
Maximal fitting to a center Minimal distance to the nearest center
Wiring Criterion
t i
ii t t 2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W
2
)()(
)(
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4676
Minimal wiring and maximal fitting
criteria
t iii t t
2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W2
)()(
)(
WEF(y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4776
Set β sufficiently small Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4876
Minimize the weight distance
k
k NBh
k hk
t
k
i
k k
k NBh
k h
t
ik k
ySolve
y yt t v
dy
W E d
y yt t v
0)()][(][
0)(
W][][E
)(
)(
2
k
2
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4976
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5076
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5176
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5276
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5376
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5476
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5576
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5676
Microarray expressions of yeast genes
Pat_Brown_Lab_Home_Page
Exploring the Metabolic and Genetic Control of Gene
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3176
net = newsom([0 02 0 01][5 5]gridtop)
5x5 lattice
2x2 matrix for two-dimensional inputs
dx2 matrix for d-dimensional inputs
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3276
net = train(netXX)
dxN data matrix
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3376
Surface construction
demo_fr2_somm
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3476
Surface reconstruction
P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]
net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)
hold onplotsom(netiw11netlayers1distances)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3576
Self-organization
Maximal fitting and minimal wiring
principle
Sort high dimensional data on a 2D
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3676
Neural optimization
K-means
Minimal distance to the nearest center
Kohonenrsquos self -organizing
Minimal wiring distance between
neighboring centers
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3776
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
C1 i=1j=1
C2 i=1j=M
C3 i=Mj=1
C4 i=Mj=M
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3876
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
S1 i=1j=2M-1
S2 i=2M-1j=M
S3 i=Mj=2M-1
S4 i=2M-1j=1
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3976
If (ij) not within S1-S4 and C1-C4
NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)
NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4076
Wiring Criterion
j M i jih
y yk k NB ji k jih
)1()(
W2
)()( )(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4176
Kohonenrsquos self -organizing algorithm
Self-organizing map - Wikipedia the free
encyclopedia
Neural networks (1996)
Neural networks (2002)
Neurocomputing (2011)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4276
SOM matlab toolbox
SOM Toolbox
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4376
Neural Networks 15(3) 337-347 (2002)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4476
Neurocomputing 74(12-13) 2228-2240
(2011)
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4576
Minimal wiring and maximal fitting
criteria
Maximal fitting to a center Minimal distance to the nearest center
Wiring Criterion
t i
ii t t 2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W
2
)()(
)(
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4676
Minimal wiring and maximal fitting
criteria
t iii t t
2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W2
)()(
)(
WEF(y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4776
Set β sufficiently small Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4876
Minimize the weight distance
k
k NBh
k hk
t
k
i
k k
k NBh
k h
t
ik k
ySolve
y yt t v
dy
W E d
y yt t v
0)()][(][
0)(
W][][E
)(
)(
2
k
2
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4976
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5076
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5176
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5276
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5376
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5476
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5576
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5676
Microarray expressions of yeast genes
Pat_Brown_Lab_Home_Page
Exploring the Metabolic and Genetic Control of Gene
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3276
net = train(netXX)
dxN data matrix
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3376
Surface construction
demo_fr2_somm
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3476
Surface reconstruction
P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]
net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)
hold onplotsom(netiw11netlayers1distances)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3576
Self-organization
Maximal fitting and minimal wiring
principle
Sort high dimensional data on a 2D
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3676
Neural optimization
K-means
Minimal distance to the nearest center
Kohonenrsquos self -organizing
Minimal wiring distance between
neighboring centers
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3776
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
C1 i=1j=1
C2 i=1j=M
C3 i=Mj=1
C4 i=Mj=M
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3876
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
S1 i=1j=2M-1
S2 i=2M-1j=M
S3 i=Mj=2M-1
S4 i=2M-1j=1
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3976
If (ij) not within S1-S4 and C1-C4
NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)
NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4076
Wiring Criterion
j M i jih
y yk k NB ji k jih
)1()(
W2
)()( )(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4176
Kohonenrsquos self -organizing algorithm
Self-organizing map - Wikipedia the free
encyclopedia
Neural networks (1996)
Neural networks (2002)
Neurocomputing (2011)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4276
SOM matlab toolbox
SOM Toolbox
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4376
Neural Networks 15(3) 337-347 (2002)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4476
Neurocomputing 74(12-13) 2228-2240
(2011)
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4576
Minimal wiring and maximal fitting
criteria
Maximal fitting to a center Minimal distance to the nearest center
Wiring Criterion
t i
ii t t 2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W
2
)()(
)(
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4676
Minimal wiring and maximal fitting
criteria
t iii t t
2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W2
)()(
)(
WEF(y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4776
Set β sufficiently small Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4876
Minimize the weight distance
k
k NBh
k hk
t
k
i
k k
k NBh
k h
t
ik k
ySolve
y yt t v
dy
W E d
y yt t v
0)()][(][
0)(
W][][E
)(
)(
2
k
2
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4976
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5076
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5176
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5276
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5376
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5476
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5576
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5676
Microarray expressions of yeast genes
Pat_Brown_Lab_Home_Page
Exploring the Metabolic and Genetic Control of Gene
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3376
Surface construction
demo_fr2_somm
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3476
Surface reconstruction
P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]
net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)
hold onplotsom(netiw11netlayers1distances)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3576
Self-organization
Maximal fitting and minimal wiring
principle
Sort high dimensional data on a 2D
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3676
Neural optimization
K-means
Minimal distance to the nearest center
Kohonenrsquos self -organizing
Minimal wiring distance between
neighboring centers
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3776
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
C1 i=1j=1
C2 i=1j=M
C3 i=Mj=1
C4 i=Mj=M
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3876
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
S1 i=1j=2M-1
S2 i=2M-1j=M
S3 i=Mj=2M-1
S4 i=2M-1j=1
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3976
If (ij) not within S1-S4 and C1-C4
NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)
NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4076
Wiring Criterion
j M i jih
y yk k NB ji k jih
)1()(
W2
)()( )(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4176
Kohonenrsquos self -organizing algorithm
Self-organizing map - Wikipedia the free
encyclopedia
Neural networks (1996)
Neural networks (2002)
Neurocomputing (2011)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4276
SOM matlab toolbox
SOM Toolbox
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4376
Neural Networks 15(3) 337-347 (2002)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4476
Neurocomputing 74(12-13) 2228-2240
(2011)
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4576
Minimal wiring and maximal fitting
criteria
Maximal fitting to a center Minimal distance to the nearest center
Wiring Criterion
t i
ii t t 2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W
2
)()(
)(
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4676
Minimal wiring and maximal fitting
criteria
t iii t t
2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W2
)()(
)(
WEF(y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4776
Set β sufficiently small Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4876
Minimize the weight distance
k
k NBh
k hk
t
k
i
k k
k NBh
k h
t
ik k
ySolve
y yt t v
dy
W E d
y yt t v
0)()][(][
0)(
W][][E
)(
)(
2
k
2
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4976
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5076
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5176
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5276
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5376
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5476
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5576
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5676
Microarray expressions of yeast genes
Pat_Brown_Lab_Home_Page
Exploring the Metabolic and Genetic Control of Gene
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3476
Surface reconstruction
P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]
net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)
hold onplotsom(netiw11netlayers1distances)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3576
Self-organization
Maximal fitting and minimal wiring
principle
Sort high dimensional data on a 2D
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3676
Neural optimization
K-means
Minimal distance to the nearest center
Kohonenrsquos self -organizing
Minimal wiring distance between
neighboring centers
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3776
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
C1 i=1j=1
C2 i=1j=M
C3 i=Mj=1
C4 i=Mj=M
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3876
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
S1 i=1j=2M-1
S2 i=2M-1j=M
S3 i=Mj=2M-1
S4 i=2M-1j=1
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3976
If (ij) not within S1-S4 and C1-C4
NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)
NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4076
Wiring Criterion
j M i jih
y yk k NB ji k jih
)1()(
W2
)()( )(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4176
Kohonenrsquos self -organizing algorithm
Self-organizing map - Wikipedia the free
encyclopedia
Neural networks (1996)
Neural networks (2002)
Neurocomputing (2011)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4276
SOM matlab toolbox
SOM Toolbox
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4376
Neural Networks 15(3) 337-347 (2002)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4476
Neurocomputing 74(12-13) 2228-2240
(2011)
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4576
Minimal wiring and maximal fitting
criteria
Maximal fitting to a center Minimal distance to the nearest center
Wiring Criterion
t i
ii t t 2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W
2
)()(
)(
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4676
Minimal wiring and maximal fitting
criteria
t iii t t
2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W2
)()(
)(
WEF(y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4776
Set β sufficiently small Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4876
Minimize the weight distance
k
k NBh
k hk
t
k
i
k k
k NBh
k h
t
ik k
ySolve
y yt t v
dy
W E d
y yt t v
0)()][(][
0)(
W][][E
)(
)(
2
k
2
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4976
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5076
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5176
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5276
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5376
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5476
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5576
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5676
Microarray expressions of yeast genes
Pat_Brown_Lab_Home_Page
Exploring the Metabolic and Genetic Control of Gene
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3576
Self-organization
Maximal fitting and minimal wiring
principle
Sort high dimensional data on a 2D
lattice
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3676
Neural optimization
K-means
Minimal distance to the nearest center
Kohonenrsquos self -organizing
Minimal wiring distance between
neighboring centers
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3776
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
C1 i=1j=1
C2 i=1j=M
C3 i=Mj=1
C4 i=Mj=M
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3876
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
S1 i=1j=2M-1
S2 i=2M-1j=M
S3 i=Mj=2M-1
S4 i=2M-1j=1
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3976
If (ij) not within S1-S4 and C1-C4
NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)
NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4076
Wiring Criterion
j M i jih
y yk k NB ji k jih
)1()(
W2
)()( )(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4176
Kohonenrsquos self -organizing algorithm
Self-organizing map - Wikipedia the free
encyclopedia
Neural networks (1996)
Neural networks (2002)
Neurocomputing (2011)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4276
SOM matlab toolbox
SOM Toolbox
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4376
Neural Networks 15(3) 337-347 (2002)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4476
Neurocomputing 74(12-13) 2228-2240
(2011)
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4576
Minimal wiring and maximal fitting
criteria
Maximal fitting to a center Minimal distance to the nearest center
Wiring Criterion
t i
ii t t 2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W
2
)()(
)(
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4676
Minimal wiring and maximal fitting
criteria
t iii t t
2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W2
)()(
)(
WEF(y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4776
Set β sufficiently small Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4876
Minimize the weight distance
k
k NBh
k hk
t
k
i
k k
k NBh
k h
t
ik k
ySolve
y yt t v
dy
W E d
y yt t v
0)()][(][
0)(
W][][E
)(
)(
2
k
2
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4976
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5076
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5176
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5276
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5376
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5476
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5576
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5676
Microarray expressions of yeast genes
Pat_Brown_Lab_Home_Page
Exploring the Metabolic and Genetic Control of Gene
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3676
Neural optimization
K-means
Minimal distance to the nearest center
Kohonenrsquos self -organizing
Minimal wiring distance between
neighboring centers
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3776
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
C1 i=1j=1
C2 i=1j=M
C3 i=Mj=1
C4 i=Mj=M
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3876
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
S1 i=1j=2M-1
S2 i=2M-1j=M
S3 i=Mj=2M-1
S4 i=2M-1j=1
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3976
If (ij) not within S1-S4 and C1-C4
NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)
NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4076
Wiring Criterion
j M i jih
y yk k NB ji k jih
)1()(
W2
)()( )(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4176
Kohonenrsquos self -organizing algorithm
Self-organizing map - Wikipedia the free
encyclopedia
Neural networks (1996)
Neural networks (2002)
Neurocomputing (2011)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4276
SOM matlab toolbox
SOM Toolbox
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4376
Neural Networks 15(3) 337-347 (2002)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4476
Neurocomputing 74(12-13) 2228-2240
(2011)
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4576
Minimal wiring and maximal fitting
criteria
Maximal fitting to a center Minimal distance to the nearest center
Wiring Criterion
t i
ii t t 2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W
2
)()(
)(
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4676
Minimal wiring and maximal fitting
criteria
t iii t t
2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W2
)()(
)(
WEF(y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4776
Set β sufficiently small Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4876
Minimize the weight distance
k
k NBh
k hk
t
k
i
k k
k NBh
k h
t
ik k
ySolve
y yt t v
dy
W E d
y yt t v
0)()][(][
0)(
W][][E
)(
)(
2
k
2
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4976
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5076
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5176
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5276
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5376
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5476
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5576
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5676
Microarray expressions of yeast genes
Pat_Brown_Lab_Home_Page
Exploring the Metabolic and Genetic Control of Gene
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3776
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
C1 i=1j=1
C2 i=1j=M
C3 i=Mj=1
C4 i=Mj=M
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3876
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
S1 i=1j=2M-1
S2 i=2M-1j=M
S3 i=Mj=2M-1
S4 i=2M-1j=1
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3976
If (ij) not within S1-S4 and C1-C4
NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)
NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4076
Wiring Criterion
j M i jih
y yk k NB ji k jih
)1()(
W2
)()( )(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4176
Kohonenrsquos self -organizing algorithm
Self-organizing map - Wikipedia the free
encyclopedia
Neural networks (1996)
Neural networks (2002)
Neurocomputing (2011)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4276
SOM matlab toolbox
SOM Toolbox
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4376
Neural Networks 15(3) 337-347 (2002)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4476
Neurocomputing 74(12-13) 2228-2240
(2011)
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4576
Minimal wiring and maximal fitting
criteria
Maximal fitting to a center Minimal distance to the nearest center
Wiring Criterion
t i
ii t t 2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W
2
)()(
)(
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4676
Minimal wiring and maximal fitting
criteria
t iii t t
2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W2
)()(
)(
WEF(y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4776
Set β sufficiently small Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4876
Minimize the weight distance
k
k NBh
k hk
t
k
i
k k
k NBh
k h
t
ik k
ySolve
y yt t v
dy
W E d
y yt t v
0)()][(][
0)(
W][][E
)(
)(
2
k
2
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4976
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5076
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5176
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5276
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5376
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5476
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5576
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5676
Microarray expressions of yeast genes
Pat_Brown_Lab_Home_Page
Exploring the Metabolic and Genetic Control of Gene
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3876
Neighboring relations on a lattice
An M-by-M lattice
A node is indexed by (ij) ij=1hellipM
S1 i=1j=2M-1
S2 i=2M-1j=M
S3 i=Mj=2M-1
S4 i=2M-1j=1
Corner 3
Corner 2
Corner 4
Corner 1 Side 1
Side 2
Side 3
Side 4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3976
If (ij) not within S1-S4 and C1-C4
NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)
NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4076
Wiring Criterion
j M i jih
y yk k NB ji k jih
)1()(
W2
)()( )(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4176
Kohonenrsquos self -organizing algorithm
Self-organizing map - Wikipedia the free
encyclopedia
Neural networks (1996)
Neural networks (2002)
Neurocomputing (2011)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4276
SOM matlab toolbox
SOM Toolbox
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4376
Neural Networks 15(3) 337-347 (2002)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4476
Neurocomputing 74(12-13) 2228-2240
(2011)
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4576
Minimal wiring and maximal fitting
criteria
Maximal fitting to a center Minimal distance to the nearest center
Wiring Criterion
t i
ii t t 2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W
2
)()(
)(
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4676
Minimal wiring and maximal fitting
criteria
t iii t t
2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W2
)()(
)(
WEF(y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4776
Set β sufficiently small Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4876
Minimize the weight distance
k
k NBh
k hk
t
k
i
k k
k NBh
k h
t
ik k
ySolve
y yt t v
dy
W E d
y yt t v
0)()][(][
0)(
W][][E
)(
)(
2
k
2
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4976
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5076
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5176
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5276
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5376
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5476
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5576
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5676
Microarray expressions of yeast genes
Pat_Brown_Lab_Home_Page
Exploring the Metabolic and Genetic Control of Gene
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 3976
If (ij) not within S1-S4 and C1-C4
NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)
NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4076
Wiring Criterion
j M i jih
y yk k NB ji k jih
)1()(
W2
)()( )(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4176
Kohonenrsquos self -organizing algorithm
Self-organizing map - Wikipedia the free
encyclopedia
Neural networks (1996)
Neural networks (2002)
Neurocomputing (2011)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4276
SOM matlab toolbox
SOM Toolbox
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4376
Neural Networks 15(3) 337-347 (2002)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4476
Neurocomputing 74(12-13) 2228-2240
(2011)
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4576
Minimal wiring and maximal fitting
criteria
Maximal fitting to a center Minimal distance to the nearest center
Wiring Criterion
t i
ii t t 2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W
2
)()(
)(
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4676
Minimal wiring and maximal fitting
criteria
t iii t t
2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W2
)()(
)(
WEF(y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4776
Set β sufficiently small Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4876
Minimize the weight distance
k
k NBh
k hk
t
k
i
k k
k NBh
k h
t
ik k
ySolve
y yt t v
dy
W E d
y yt t v
0)()][(][
0)(
W][][E
)(
)(
2
k
2
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4976
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5076
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5176
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5276
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5376
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5476
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5576
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5676
Microarray expressions of yeast genes
Pat_Brown_Lab_Home_Page
Exploring the Metabolic and Genetic Control of Gene
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4076
Wiring Criterion
j M i jih
y yk k NB ji k jih
)1()(
W2
)()( )(
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4176
Kohonenrsquos self -organizing algorithm
Self-organizing map - Wikipedia the free
encyclopedia
Neural networks (1996)
Neural networks (2002)
Neurocomputing (2011)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4276
SOM matlab toolbox
SOM Toolbox
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4376
Neural Networks 15(3) 337-347 (2002)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4476
Neurocomputing 74(12-13) 2228-2240
(2011)
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4576
Minimal wiring and maximal fitting
criteria
Maximal fitting to a center Minimal distance to the nearest center
Wiring Criterion
t i
ii t t 2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W
2
)()(
)(
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4676
Minimal wiring and maximal fitting
criteria
t iii t t
2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W2
)()(
)(
WEF(y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4776
Set β sufficiently small Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4876
Minimize the weight distance
k
k NBh
k hk
t
k
i
k k
k NBh
k h
t
ik k
ySolve
y yt t v
dy
W E d
y yt t v
0)()][(][
0)(
W][][E
)(
)(
2
k
2
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4976
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5076
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5176
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5276
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5376
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5476
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5576
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5676
Microarray expressions of yeast genes
Pat_Brown_Lab_Home_Page
Exploring the Metabolic and Genetic Control of Gene
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4176
Kohonenrsquos self -organizing algorithm
Self-organizing map - Wikipedia the free
encyclopedia
Neural networks (1996)
Neural networks (2002)
Neurocomputing (2011)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4276
SOM matlab toolbox
SOM Toolbox
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4376
Neural Networks 15(3) 337-347 (2002)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4476
Neurocomputing 74(12-13) 2228-2240
(2011)
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4576
Minimal wiring and maximal fitting
criteria
Maximal fitting to a center Minimal distance to the nearest center
Wiring Criterion
t i
ii t t 2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W
2
)()(
)(
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4676
Minimal wiring and maximal fitting
criteria
t iii t t
2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W2
)()(
)(
WEF(y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4776
Set β sufficiently small Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4876
Minimize the weight distance
k
k NBh
k hk
t
k
i
k k
k NBh
k h
t
ik k
ySolve
y yt t v
dy
W E d
y yt t v
0)()][(][
0)(
W][][E
)(
)(
2
k
2
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4976
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5076
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5176
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5276
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5376
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5476
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5576
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5676
Microarray expressions of yeast genes
Pat_Brown_Lab_Home_Page
Exploring the Metabolic and Genetic Control of Gene
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4276
SOM matlab toolbox
SOM Toolbox
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4376
Neural Networks 15(3) 337-347 (2002)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4476
Neurocomputing 74(12-13) 2228-2240
(2011)
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4576
Minimal wiring and maximal fitting
criteria
Maximal fitting to a center Minimal distance to the nearest center
Wiring Criterion
t i
ii t t 2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W
2
)()(
)(
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4676
Minimal wiring and maximal fitting
criteria
t iii t t
2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W2
)()(
)(
WEF(y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4776
Set β sufficiently small Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4876
Minimize the weight distance
k
k NBh
k hk
t
k
i
k k
k NBh
k h
t
ik k
ySolve
y yt t v
dy
W E d
y yt t v
0)()][(][
0)(
W][][E
)(
)(
2
k
2
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4976
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5076
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5176
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5276
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5376
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5476
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5576
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5676
Microarray expressions of yeast genes
Pat_Brown_Lab_Home_Page
Exploring the Metabolic and Genetic Control of Gene
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4376
Neural Networks 15(3) 337-347 (2002)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4476
Neurocomputing 74(12-13) 2228-2240
(2011)
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4576
Minimal wiring and maximal fitting
criteria
Maximal fitting to a center Minimal distance to the nearest center
Wiring Criterion
t i
ii t t 2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W
2
)()(
)(
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4676
Minimal wiring and maximal fitting
criteria
t iii t t
2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W2
)()(
)(
WEF(y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4776
Set β sufficiently small Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4876
Minimize the weight distance
k
k NBh
k hk
t
k
i
k k
k NBh
k h
t
ik k
ySolve
y yt t v
dy
W E d
y yt t v
0)()][(][
0)(
W][][E
)(
)(
2
k
2
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4976
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5076
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5176
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5276
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5376
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5476
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5576
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5676
Microarray expressions of yeast genes
Pat_Brown_Lab_Home_Page
Exploring the Metabolic and Genetic Control of Gene
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4476
Neurocomputing 74(12-13) 2228-2240
(2011)
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4576
Minimal wiring and maximal fitting
criteria
Maximal fitting to a center Minimal distance to the nearest center
Wiring Criterion
t i
ii t t 2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W
2
)()(
)(
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4676
Minimal wiring and maximal fitting
criteria
t iii t t
2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W2
)()(
)(
WEF(y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4776
Set β sufficiently small Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4876
Minimize the weight distance
k
k NBh
k hk
t
k
i
k k
k NBh
k h
t
ik k
ySolve
y yt t v
dy
W E d
y yt t v
0)()][(][
0)(
W][][E
)(
)(
2
k
2
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4976
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5076
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5176
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5276
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5376
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5476
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5576
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5676
Microarray expressions of yeast genes
Pat_Brown_Lab_Home_Page
Exploring the Metabolic and Genetic Control of Gene
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4576
Minimal wiring and maximal fitting
criteria
Maximal fitting to a center Minimal distance to the nearest center
Wiring Criterion
t i
ii t t 2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W
2
)()(
)(
Minimal wiring and maximal fitting
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4676
Minimal wiring and maximal fitting
criteria
t iii t t
2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W2
)()(
)(
WEF(y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4776
Set β sufficiently small Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4876
Minimize the weight distance
k
k NBh
k hk
t
k
i
k k
k NBh
k h
t
ik k
ySolve
y yt t v
dy
W E d
y yt t v
0)()][(][
0)(
W][][E
)(
)(
2
k
2
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4976
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5076
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5176
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5276
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5376
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5476
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5576
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5676
Microarray expressions of yeast genes
Pat_Brown_Lab_Home_Page
Exploring the Metabolic and Genetic Control of Gene
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4676
Minimal wiring and maximal fitting
criteria
t iii t t
2
][][E(y) yx
j M i jih
y yk k NB ji
k jih
)1()(
W2
)()(
)(
WEF(y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4776
Set β sufficiently small Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4876
Minimize the weight distance
k
k NBh
k hk
t
k
i
k k
k NBh
k h
t
ik k
ySolve
y yt t v
dy
W E d
y yt t v
0)()][(][
0)(
W][][E
)(
)(
2
k
2
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4976
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5076
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5176
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5276
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5376
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5476
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5576
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5676
Microarray expressions of yeast genes
Pat_Brown_Lab_Home_Page
Exploring the Metabolic and Genetic Control of Gene
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4776
Set β sufficiently small Input all xt Set K
Halting
condition
Determine overlapping membership
of each xt
Update each yk
Increase β carefully
exit
Y
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4876
Minimize the weight distance
k
k NBh
k hk
t
k
i
k k
k NBh
k h
t
ik k
ySolve
y yt t v
dy
W E d
y yt t v
0)()][(][
0)(
W][][E
)(
)(
2
k
2
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4976
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5076
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5176
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5276
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5376
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5476
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5576
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5676
Microarray expressions of yeast genes
Pat_Brown_Lab_Home_Page
Exploring the Metabolic and Genetic Control of Gene
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4876
Minimize the weight distance
k
k NBh
k hk
t
k
i
k k
k NBh
k h
t
ik k
ySolve
y yt t v
dy
W E d
y yt t v
0)()][(][
0)(
W][][E
)(
)(
2
k
2
yx
yx
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4976
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5076
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5176
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5276
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5376
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5476
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5576
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5676
Microarray expressions of yeast genes
Pat_Brown_Lab_Home_Page
Exploring the Metabolic and Genetic Control of Gene
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 4976
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5076
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5176
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5276
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5376
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5476
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5576
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5676
Microarray expressions of yeast genes
Pat_Brown_Lab_Home_Page
Exploring the Metabolic and Genetic Control of Gene
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5076
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5176
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5276
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5376
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5476
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5576
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5676
Microarray expressions of yeast genes
Pat_Brown_Lab_Home_Page
Exploring the Metabolic and Genetic Control of Gene
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5176
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5276
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5376
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5476
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5576
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5676
Microarray expressions of yeast genes
Pat_Brown_Lab_Home_Page
Exploring the Metabolic and Genetic Control of Gene
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5276
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5376
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5476
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5576
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5676
Microarray expressions of yeast genes
Pat_Brown_Lab_Home_Page
Exploring the Metabolic and Genetic Control of Gene
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5376
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5476
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5576
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5676
Microarray expressions of yeast genes
Pat_Brown_Lab_Home_Page
Exploring the Metabolic and Genetic Control of Gene
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5476
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5576
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5676
Microarray expressions of yeast genes
Pat_Brown_Lab_Home_Page
Exploring the Metabolic and Genetic Control of Gene
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5576
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5676
Microarray expressions of yeast genes
Pat_Brown_Lab_Home_Page
Exploring the Metabolic and Genetic Control of Gene
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5676
Microarray expressions of yeast genes
Pat_Brown_Lab_Home_Page
Exploring the Metabolic and Genetic Control of Gene
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5776
Exploring the Metabolic and Genetic Control of Gene
Expression on a Genomic Scale
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5876
Load gene matrix
Load yeastdata822
Genes(1020)Display names of genesnumbered between 10 and 20
Gene matri 822 7
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 5976
Gene matrix 822x7
Gene id
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6076
Problem
Apply the k-means algorithm to processdata_25
Apply the annealed k-means algorithm to
process data_25 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6176
Problem
Apply the k-means algorithm to processyeast_822
Apply the annealed k-means algorithm to
process yeast_822 Numerical simulations for comparisons
of the two algorithms
Quantitative evaluations Statistics of mean and variance summarized
over 10 executions
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6276
Gene clustering by kmeans
Load data
load yeastdata822mat
times=[0 95 115 135 155 175 195]
plot(timesyeastvalues)
G l t i b k
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6376
Gene clustering by kmeans
Kmeans analysis
for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))
end
Display
Genes belongingthe second cluster
[cidx ctrs] = kmeans(yeastvalues 16)
genes(cidx==2)
E i f 16 l t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6476
Expressions of 16 gene clusters
St ti ti l D d
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6576
Statistical Dependency
load yeastdata822matX=yeastvalues
XX=X
W=jader(XX)
Y=WXX
X=Y
yeast_ICA_kmeansm
P bl Y t ICA K
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6676
Problem Yeast_ICA_Kmean
Apply Jade_ICA to translate yeast data to
independent components
Is it reasonable to employ Euclidean distance
to measure similarity of genes What is the impact of ICA on gene clustering
Numerical simulations
ICA
Annealed k-means clustering
Describe clusters of genes
P bl
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6776
Problem
Apply the Kohonenrsquos self -organizing
algorithm to sort yeast_822 on a lattice
Refer to Neurocomputing 74(12-13)
2228-2240 (2011)
Visualize yeast gene expressions at
each time course
G ICA SOM
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6876
Gene_ICA_SOM
JadeICA
load yeastdata822mat
SOM(20x20)
yeast822_som_20x20matdemo_gene_somm
C Di t
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 6976
Centers=netiw11
Y=Centers
X=P
M=size(Y1)N=size(X1)
A=sum(X^22)ones(1M)
C=ones(N1)sum(Y^22)
B=XY
D=sqrt(A-2B+C)
Cross Distances
C Di t M t i
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7076
Cross_Distance Matrix
D 822x400
Cross distances between 822 genes and
400 gene centers
400 clusters
Q
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7176
Query
How many genes in each cluster
Which cluster a gene belongs
[xx ind]=min(D)
sum(ind==k) how many genes in the kth cluster
ind(i)
Vi li ti f d it
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7276
Visualization of gene density
[xx ind]=min(D) K=20 M=K^2
for k=1M
num(k)=sum(ind==k) how many genes in the kth cluster
end
a = linspace(-11K)
x = []for i=1K
xx = [ones(201)a(i) a]
x = [x xx]
end
y = num x=x
Fitting (x y)
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7376
Fitting (xy)
Visualization of gene expressions
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7476
Visualization of gene expressions
Centers=netiw11
Y=Centers
a = linspace(-11K)
x = []
for i=1K
xx = [ones(201)a(i) a]x = [x xx]
end
time = 1
y = Y( time) x=x
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7576
Problem
Apply ICA to translate yeast_822 to
independent components
Apply SOM to process independent
components of yeast_822
Visualize yeast gene expressions at
each time course
Problem
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course
7182019 Gene Clustering
httpslidepdfcomreaderfullgene-clustering 7676
Problem
Apply ICA to translate yeast_822 toindependent components
Apply annealed SOM to process
independent components of yeast_822
Visualize yeast gene expressions at
each time course