gene clustering

76
 Gene clustering Gene sorting

Upload: rongchoichitu

Post on 14-Jan-2016

25 views

Category:

Documents


0 download

DESCRIPTION

dk

TRANSCRIPT

Page 1: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 176

Gene clustering

Gene sorting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 276

Outline

Clustering

K-means

Annealed K-means

Kohonenrsquos Self -organizing algorithm

Yeast gene analysis

Cancer gene analysis

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 376

High dimensional data clustering

K-means

Annealed K-Means

Euclidean distance measure

Mahalanobis distance measure

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 476

Data clustering

Find cluster centers

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 576

Matlab stats toolbox

Add directory toolboxstats to matlab path

Load data_9mat

Kmeans clustering

[cidx ctrs] = kmeans(XX10)

plot(XX(1)XX(2))

hold onplot(ctrs(1)ctrs(2)ro)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 676

Matlab stats toolbox

[cidx ctrs] = kmeans(XX10)

Ten clusters

Data matrix Nxd

Center locations Kxd

Membership Nx1

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 776

Download

Internet

Example Add directory cluster to matlab path

Load data_9mat

Kmeans clustering [cidx ctrs] = kmeans(XX10)

plot(XX(1)XX(2))hold onplot(ctrs(1)ctrs(2)ro)

My_Kmeans Parallel codes

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 876

[cidx ctrs] = kmeans(XX10)

Ten clusters

Data matrix dxN

Center locations dxK

Membership 1xN

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 976

K-means

An iterative approach

Non-overlapping partition

Means of partitioned groups

Non-overlapping partition Each data x

is partitioned to its nearest center

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1076

Stepwise procedure

1 Input unsupervised high dimensional data

and the number K of clusters

2 Randomly initialize K centers near the center

of given data3 Partition all data to K clusters using current

means

4 Update the center position of each group5 If halting condition holds exit

6 Go to step 3

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1176

Exercise

Draw a while-loop flow chart to illustrate

how the K-means algorithm partitions

high-dimensional data to K clusters

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1276

Exercise

Apply the K-means method to clustering

analysis of data in data_25mat

Calculate the mean distance between

each data point and its correspondent

center

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1376

Annealed K-means

Use overlapping memberships

Overcome the local minimum problem

Relaxation under an annealing process

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1476

Overlapping memberships

Each x has an overlapping membership

to each cluster

The membership to the ith cluster is

proportional to exp(-βdi) where di denote

the distance between x and the ith center

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1576

Annealed K-means

1 Input unsupervised high dimensional dataand the number K of clusters

2 Set β to sufficiently small positive numberrandomly initialize K centers near the center

of given data3 Determine overlapping memberships of all

data to K clusters

4 Update the center position of each group

5 Increase β carefully 6 If halting condition holds exit

7 Go to step 3

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1676

Exercise

Draw a while-loop flow chart to illustrate

how the annealed K-means algorithm

partition high-dimensional data to K

clusters

Implement the flow chart using Matlab

codes

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1776

Set β sufficiently small

Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1876

Minimize the weight

distance

t

i

t

i

i

t

i

i

i

t

iii

t v

t t v

t t v

dy

dE

t t v

][

][][

0)][(][

0

][][E

i

2

x

y

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1976

k

k

k

k

k

k

k k

k k

d C

d C

v

d C vd v

)exp(1

1)exp(

1

)exp()exp(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2076

j

j

k

k d

d v

)exp(

)exp()(

d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2176

Overlapping membership

Sufficiently small β

vk equals 1K

Sufficiently large β

vk = 1 if dk = mini di

= 0 otherwise

All vk are determined by the winner-take-all

principle

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2276

Why annealing

Sufficiently large β leads to the K-means

algorithm

Can annealing improve the performance

How to measure the performance for

clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2376

center nearestthe][eachof distancetheupsums

10][][

][][

][][

EE

][][E

i

2

2

2

tot E

t vt if

t t v

t t v

t t v

i

t i

ii

i t

ii

i

i

t

iii

x

yx

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2476

Criterion of K-means

Minimal distance to

the nearest center

t i

ii t t 2][][E(y) yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2576

Cross distance

Refer to Math Programming

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2676

Self-organization

Clustering

Define neighboring relations of clusters

on a 2D lattice

Sorting high-dimensional data on a

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2776

Clustering by SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2876

SOM

Download Matlab Neural Networks

toolbox

Add directory nnet with sub-directories

to matlab path

Apply the SOM method to clustering

analysis of data in data_25mat

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2976

Load data

load data_25

plot(XX(1)XX(2)g)

hold on

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3076

SOM

net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100

Initialization

net = train(netXX)

plotsom(netiw11netlayers1distances)

Training

Display

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3176

net = newsom([0 02 0 01][5 5]gridtop)

5x5 lattice

2x2 matrix for two-dimensional inputs

dx2 matrix for d-dimensional inputs

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3276

net = train(netXX)

dxN data matrix

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3376

Surface construction

demo_fr2_somm

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3476

Surface reconstruction

P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]

net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)

hold onplotsom(netiw11netlayers1distances)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3576

Self-organization

Maximal fitting and minimal wiring

principle

Sort high dimensional data on a 2D

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3676

Neural optimization

K-means

Minimal distance to the nearest center

Kohonenrsquos self -organizing

Minimal wiring distance between

neighboring centers

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3776

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

C1 i=1j=1

C2 i=1j=M

C3 i=Mj=1

C4 i=Mj=M

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3876

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

S1 i=1j=2M-1

S2 i=2M-1j=M

S3 i=Mj=2M-1

S4 i=2M-1j=1

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3976

If (ij) not within S1-S4 and C1-C4

NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)

NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4076

Wiring Criterion

j M i jih

y yk k NB ji k jih

)1()(

W2

)()( )(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4176

Kohonenrsquos self -organizing algorithm

Self-organizing map - Wikipedia the free

encyclopedia

Neural networks (1996)

Neural networks (2002)

Neurocomputing (2011)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4276

SOM matlab toolbox

SOM Toolbox

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4376

Neural Networks 15(3) 337-347 (2002)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4476

Neurocomputing 74(12-13) 2228-2240

(2011)

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4576

Minimal wiring and maximal fitting

criteria

Maximal fitting to a center Minimal distance to the nearest center

Wiring Criterion

t i

ii t t 2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W

2

)()(

)(

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4676

Minimal wiring and maximal fitting

criteria

t iii t t

2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W2

)()(

)(

WEF(y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4776

Set β sufficiently small Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4876

Minimize the weight distance

k

k NBh

k hk

t

k

i

k k

k NBh

k h

t

ik k

ySolve

y yt t v

dy

W E d

y yt t v

0)()][(][

0)(

W][][E

)(

)(

2

k

2

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4976

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5076

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5176

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5276

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5376

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5476

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5576

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5676

Microarray expressions of yeast genes

Pat_Brown_Lab_Home_Page

Exploring the Metabolic and Genetic Control of Gene

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 2: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 276

Outline

Clustering

K-means

Annealed K-means

Kohonenrsquos Self -organizing algorithm

Yeast gene analysis

Cancer gene analysis

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 376

High dimensional data clustering

K-means

Annealed K-Means

Euclidean distance measure

Mahalanobis distance measure

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 476

Data clustering

Find cluster centers

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 576

Matlab stats toolbox

Add directory toolboxstats to matlab path

Load data_9mat

Kmeans clustering

[cidx ctrs] = kmeans(XX10)

plot(XX(1)XX(2))

hold onplot(ctrs(1)ctrs(2)ro)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 676

Matlab stats toolbox

[cidx ctrs] = kmeans(XX10)

Ten clusters

Data matrix Nxd

Center locations Kxd

Membership Nx1

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 776

Download

Internet

Example Add directory cluster to matlab path

Load data_9mat

Kmeans clustering [cidx ctrs] = kmeans(XX10)

plot(XX(1)XX(2))hold onplot(ctrs(1)ctrs(2)ro)

My_Kmeans Parallel codes

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 876

[cidx ctrs] = kmeans(XX10)

Ten clusters

Data matrix dxN

Center locations dxK

Membership 1xN

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 976

K-means

An iterative approach

Non-overlapping partition

Means of partitioned groups

Non-overlapping partition Each data x

is partitioned to its nearest center

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1076

Stepwise procedure

1 Input unsupervised high dimensional data

and the number K of clusters

2 Randomly initialize K centers near the center

of given data3 Partition all data to K clusters using current

means

4 Update the center position of each group5 If halting condition holds exit

6 Go to step 3

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1176

Exercise

Draw a while-loop flow chart to illustrate

how the K-means algorithm partitions

high-dimensional data to K clusters

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1276

Exercise

Apply the K-means method to clustering

analysis of data in data_25mat

Calculate the mean distance between

each data point and its correspondent

center

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1376

Annealed K-means

Use overlapping memberships

Overcome the local minimum problem

Relaxation under an annealing process

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1476

Overlapping memberships

Each x has an overlapping membership

to each cluster

The membership to the ith cluster is

proportional to exp(-βdi) where di denote

the distance between x and the ith center

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1576

Annealed K-means

1 Input unsupervised high dimensional dataand the number K of clusters

2 Set β to sufficiently small positive numberrandomly initialize K centers near the center

of given data3 Determine overlapping memberships of all

data to K clusters

4 Update the center position of each group

5 Increase β carefully 6 If halting condition holds exit

7 Go to step 3

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1676

Exercise

Draw a while-loop flow chart to illustrate

how the annealed K-means algorithm

partition high-dimensional data to K

clusters

Implement the flow chart using Matlab

codes

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1776

Set β sufficiently small

Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1876

Minimize the weight

distance

t

i

t

i

i

t

i

i

i

t

iii

t v

t t v

t t v

dy

dE

t t v

][

][][

0)][(][

0

][][E

i

2

x

y

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1976

k

k

k

k

k

k

k k

k k

d C

d C

v

d C vd v

)exp(1

1)exp(

1

)exp()exp(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2076

j

j

k

k d

d v

)exp(

)exp()(

d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2176

Overlapping membership

Sufficiently small β

vk equals 1K

Sufficiently large β

vk = 1 if dk = mini di

= 0 otherwise

All vk are determined by the winner-take-all

principle

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2276

Why annealing

Sufficiently large β leads to the K-means

algorithm

Can annealing improve the performance

How to measure the performance for

clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2376

center nearestthe][eachof distancetheupsums

10][][

][][

][][

EE

][][E

i

2

2

2

tot E

t vt if

t t v

t t v

t t v

i

t i

ii

i t

ii

i

i

t

iii

x

yx

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2476

Criterion of K-means

Minimal distance to

the nearest center

t i

ii t t 2][][E(y) yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2576

Cross distance

Refer to Math Programming

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2676

Self-organization

Clustering

Define neighboring relations of clusters

on a 2D lattice

Sorting high-dimensional data on a

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2776

Clustering by SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2876

SOM

Download Matlab Neural Networks

toolbox

Add directory nnet with sub-directories

to matlab path

Apply the SOM method to clustering

analysis of data in data_25mat

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2976

Load data

load data_25

plot(XX(1)XX(2)g)

hold on

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3076

SOM

net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100

Initialization

net = train(netXX)

plotsom(netiw11netlayers1distances)

Training

Display

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3176

net = newsom([0 02 0 01][5 5]gridtop)

5x5 lattice

2x2 matrix for two-dimensional inputs

dx2 matrix for d-dimensional inputs

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3276

net = train(netXX)

dxN data matrix

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3376

Surface construction

demo_fr2_somm

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3476

Surface reconstruction

P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]

net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)

hold onplotsom(netiw11netlayers1distances)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3576

Self-organization

Maximal fitting and minimal wiring

principle

Sort high dimensional data on a 2D

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3676

Neural optimization

K-means

Minimal distance to the nearest center

Kohonenrsquos self -organizing

Minimal wiring distance between

neighboring centers

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3776

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

C1 i=1j=1

C2 i=1j=M

C3 i=Mj=1

C4 i=Mj=M

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3876

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

S1 i=1j=2M-1

S2 i=2M-1j=M

S3 i=Mj=2M-1

S4 i=2M-1j=1

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3976

If (ij) not within S1-S4 and C1-C4

NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)

NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4076

Wiring Criterion

j M i jih

y yk k NB ji k jih

)1()(

W2

)()( )(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4176

Kohonenrsquos self -organizing algorithm

Self-organizing map - Wikipedia the free

encyclopedia

Neural networks (1996)

Neural networks (2002)

Neurocomputing (2011)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4276

SOM matlab toolbox

SOM Toolbox

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4376

Neural Networks 15(3) 337-347 (2002)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4476

Neurocomputing 74(12-13) 2228-2240

(2011)

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4576

Minimal wiring and maximal fitting

criteria

Maximal fitting to a center Minimal distance to the nearest center

Wiring Criterion

t i

ii t t 2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W

2

)()(

)(

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4676

Minimal wiring and maximal fitting

criteria

t iii t t

2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W2

)()(

)(

WEF(y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4776

Set β sufficiently small Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4876

Minimize the weight distance

k

k NBh

k hk

t

k

i

k k

k NBh

k h

t

ik k

ySolve

y yt t v

dy

W E d

y yt t v

0)()][(][

0)(

W][][E

)(

)(

2

k

2

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4976

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5076

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5176

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5276

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5376

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5476

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5576

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5676

Microarray expressions of yeast genes

Pat_Brown_Lab_Home_Page

Exploring the Metabolic and Genetic Control of Gene

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 3: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 376

High dimensional data clustering

K-means

Annealed K-Means

Euclidean distance measure

Mahalanobis distance measure

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 476

Data clustering

Find cluster centers

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 576

Matlab stats toolbox

Add directory toolboxstats to matlab path

Load data_9mat

Kmeans clustering

[cidx ctrs] = kmeans(XX10)

plot(XX(1)XX(2))

hold onplot(ctrs(1)ctrs(2)ro)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 676

Matlab stats toolbox

[cidx ctrs] = kmeans(XX10)

Ten clusters

Data matrix Nxd

Center locations Kxd

Membership Nx1

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 776

Download

Internet

Example Add directory cluster to matlab path

Load data_9mat

Kmeans clustering [cidx ctrs] = kmeans(XX10)

plot(XX(1)XX(2))hold onplot(ctrs(1)ctrs(2)ro)

My_Kmeans Parallel codes

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 876

[cidx ctrs] = kmeans(XX10)

Ten clusters

Data matrix dxN

Center locations dxK

Membership 1xN

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 976

K-means

An iterative approach

Non-overlapping partition

Means of partitioned groups

Non-overlapping partition Each data x

is partitioned to its nearest center

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1076

Stepwise procedure

1 Input unsupervised high dimensional data

and the number K of clusters

2 Randomly initialize K centers near the center

of given data3 Partition all data to K clusters using current

means

4 Update the center position of each group5 If halting condition holds exit

6 Go to step 3

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1176

Exercise

Draw a while-loop flow chart to illustrate

how the K-means algorithm partitions

high-dimensional data to K clusters

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1276

Exercise

Apply the K-means method to clustering

analysis of data in data_25mat

Calculate the mean distance between

each data point and its correspondent

center

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1376

Annealed K-means

Use overlapping memberships

Overcome the local minimum problem

Relaxation under an annealing process

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1476

Overlapping memberships

Each x has an overlapping membership

to each cluster

The membership to the ith cluster is

proportional to exp(-βdi) where di denote

the distance between x and the ith center

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1576

Annealed K-means

1 Input unsupervised high dimensional dataand the number K of clusters

2 Set β to sufficiently small positive numberrandomly initialize K centers near the center

of given data3 Determine overlapping memberships of all

data to K clusters

4 Update the center position of each group

5 Increase β carefully 6 If halting condition holds exit

7 Go to step 3

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1676

Exercise

Draw a while-loop flow chart to illustrate

how the annealed K-means algorithm

partition high-dimensional data to K

clusters

Implement the flow chart using Matlab

codes

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1776

Set β sufficiently small

Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1876

Minimize the weight

distance

t

i

t

i

i

t

i

i

i

t

iii

t v

t t v

t t v

dy

dE

t t v

][

][][

0)][(][

0

][][E

i

2

x

y

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1976

k

k

k

k

k

k

k k

k k

d C

d C

v

d C vd v

)exp(1

1)exp(

1

)exp()exp(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2076

j

j

k

k d

d v

)exp(

)exp()(

d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2176

Overlapping membership

Sufficiently small β

vk equals 1K

Sufficiently large β

vk = 1 if dk = mini di

= 0 otherwise

All vk are determined by the winner-take-all

principle

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2276

Why annealing

Sufficiently large β leads to the K-means

algorithm

Can annealing improve the performance

How to measure the performance for

clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2376

center nearestthe][eachof distancetheupsums

10][][

][][

][][

EE

][][E

i

2

2

2

tot E

t vt if

t t v

t t v

t t v

i

t i

ii

i t

ii

i

i

t

iii

x

yx

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2476

Criterion of K-means

Minimal distance to

the nearest center

t i

ii t t 2][][E(y) yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2576

Cross distance

Refer to Math Programming

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2676

Self-organization

Clustering

Define neighboring relations of clusters

on a 2D lattice

Sorting high-dimensional data on a

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2776

Clustering by SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2876

SOM

Download Matlab Neural Networks

toolbox

Add directory nnet with sub-directories

to matlab path

Apply the SOM method to clustering

analysis of data in data_25mat

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2976

Load data

load data_25

plot(XX(1)XX(2)g)

hold on

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3076

SOM

net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100

Initialization

net = train(netXX)

plotsom(netiw11netlayers1distances)

Training

Display

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3176

net = newsom([0 02 0 01][5 5]gridtop)

5x5 lattice

2x2 matrix for two-dimensional inputs

dx2 matrix for d-dimensional inputs

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3276

net = train(netXX)

dxN data matrix

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3376

Surface construction

demo_fr2_somm

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3476

Surface reconstruction

P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]

net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)

hold onplotsom(netiw11netlayers1distances)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3576

Self-organization

Maximal fitting and minimal wiring

principle

Sort high dimensional data on a 2D

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3676

Neural optimization

K-means

Minimal distance to the nearest center

Kohonenrsquos self -organizing

Minimal wiring distance between

neighboring centers

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3776

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

C1 i=1j=1

C2 i=1j=M

C3 i=Mj=1

C4 i=Mj=M

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3876

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

S1 i=1j=2M-1

S2 i=2M-1j=M

S3 i=Mj=2M-1

S4 i=2M-1j=1

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3976

If (ij) not within S1-S4 and C1-C4

NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)

NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4076

Wiring Criterion

j M i jih

y yk k NB ji k jih

)1()(

W2

)()( )(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4176

Kohonenrsquos self -organizing algorithm

Self-organizing map - Wikipedia the free

encyclopedia

Neural networks (1996)

Neural networks (2002)

Neurocomputing (2011)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4276

SOM matlab toolbox

SOM Toolbox

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4376

Neural Networks 15(3) 337-347 (2002)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4476

Neurocomputing 74(12-13) 2228-2240

(2011)

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4576

Minimal wiring and maximal fitting

criteria

Maximal fitting to a center Minimal distance to the nearest center

Wiring Criterion

t i

ii t t 2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W

2

)()(

)(

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4676

Minimal wiring and maximal fitting

criteria

t iii t t

2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W2

)()(

)(

WEF(y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4776

Set β sufficiently small Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4876

Minimize the weight distance

k

k NBh

k hk

t

k

i

k k

k NBh

k h

t

ik k

ySolve

y yt t v

dy

W E d

y yt t v

0)()][(][

0)(

W][][E

)(

)(

2

k

2

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4976

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5076

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5176

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5276

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5376

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5476

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5576

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5676

Microarray expressions of yeast genes

Pat_Brown_Lab_Home_Page

Exploring the Metabolic and Genetic Control of Gene

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 4: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 476

Data clustering

Find cluster centers

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 576

Matlab stats toolbox

Add directory toolboxstats to matlab path

Load data_9mat

Kmeans clustering

[cidx ctrs] = kmeans(XX10)

plot(XX(1)XX(2))

hold onplot(ctrs(1)ctrs(2)ro)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 676

Matlab stats toolbox

[cidx ctrs] = kmeans(XX10)

Ten clusters

Data matrix Nxd

Center locations Kxd

Membership Nx1

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 776

Download

Internet

Example Add directory cluster to matlab path

Load data_9mat

Kmeans clustering [cidx ctrs] = kmeans(XX10)

plot(XX(1)XX(2))hold onplot(ctrs(1)ctrs(2)ro)

My_Kmeans Parallel codes

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 876

[cidx ctrs] = kmeans(XX10)

Ten clusters

Data matrix dxN

Center locations dxK

Membership 1xN

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 976

K-means

An iterative approach

Non-overlapping partition

Means of partitioned groups

Non-overlapping partition Each data x

is partitioned to its nearest center

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1076

Stepwise procedure

1 Input unsupervised high dimensional data

and the number K of clusters

2 Randomly initialize K centers near the center

of given data3 Partition all data to K clusters using current

means

4 Update the center position of each group5 If halting condition holds exit

6 Go to step 3

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1176

Exercise

Draw a while-loop flow chart to illustrate

how the K-means algorithm partitions

high-dimensional data to K clusters

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1276

Exercise

Apply the K-means method to clustering

analysis of data in data_25mat

Calculate the mean distance between

each data point and its correspondent

center

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1376

Annealed K-means

Use overlapping memberships

Overcome the local minimum problem

Relaxation under an annealing process

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1476

Overlapping memberships

Each x has an overlapping membership

to each cluster

The membership to the ith cluster is

proportional to exp(-βdi) where di denote

the distance between x and the ith center

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1576

Annealed K-means

1 Input unsupervised high dimensional dataand the number K of clusters

2 Set β to sufficiently small positive numberrandomly initialize K centers near the center

of given data3 Determine overlapping memberships of all

data to K clusters

4 Update the center position of each group

5 Increase β carefully 6 If halting condition holds exit

7 Go to step 3

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1676

Exercise

Draw a while-loop flow chart to illustrate

how the annealed K-means algorithm

partition high-dimensional data to K

clusters

Implement the flow chart using Matlab

codes

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1776

Set β sufficiently small

Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1876

Minimize the weight

distance

t

i

t

i

i

t

i

i

i

t

iii

t v

t t v

t t v

dy

dE

t t v

][

][][

0)][(][

0

][][E

i

2

x

y

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1976

k

k

k

k

k

k

k k

k k

d C

d C

v

d C vd v

)exp(1

1)exp(

1

)exp()exp(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2076

j

j

k

k d

d v

)exp(

)exp()(

d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2176

Overlapping membership

Sufficiently small β

vk equals 1K

Sufficiently large β

vk = 1 if dk = mini di

= 0 otherwise

All vk are determined by the winner-take-all

principle

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2276

Why annealing

Sufficiently large β leads to the K-means

algorithm

Can annealing improve the performance

How to measure the performance for

clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2376

center nearestthe][eachof distancetheupsums

10][][

][][

][][

EE

][][E

i

2

2

2

tot E

t vt if

t t v

t t v

t t v

i

t i

ii

i t

ii

i

i

t

iii

x

yx

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2476

Criterion of K-means

Minimal distance to

the nearest center

t i

ii t t 2][][E(y) yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2576

Cross distance

Refer to Math Programming

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2676

Self-organization

Clustering

Define neighboring relations of clusters

on a 2D lattice

Sorting high-dimensional data on a

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2776

Clustering by SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2876

SOM

Download Matlab Neural Networks

toolbox

Add directory nnet with sub-directories

to matlab path

Apply the SOM method to clustering

analysis of data in data_25mat

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2976

Load data

load data_25

plot(XX(1)XX(2)g)

hold on

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3076

SOM

net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100

Initialization

net = train(netXX)

plotsom(netiw11netlayers1distances)

Training

Display

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3176

net = newsom([0 02 0 01][5 5]gridtop)

5x5 lattice

2x2 matrix for two-dimensional inputs

dx2 matrix for d-dimensional inputs

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3276

net = train(netXX)

dxN data matrix

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3376

Surface construction

demo_fr2_somm

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3476

Surface reconstruction

P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]

net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)

hold onplotsom(netiw11netlayers1distances)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3576

Self-organization

Maximal fitting and minimal wiring

principle

Sort high dimensional data on a 2D

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3676

Neural optimization

K-means

Minimal distance to the nearest center

Kohonenrsquos self -organizing

Minimal wiring distance between

neighboring centers

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3776

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

C1 i=1j=1

C2 i=1j=M

C3 i=Mj=1

C4 i=Mj=M

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3876

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

S1 i=1j=2M-1

S2 i=2M-1j=M

S3 i=Mj=2M-1

S4 i=2M-1j=1

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3976

If (ij) not within S1-S4 and C1-C4

NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)

NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4076

Wiring Criterion

j M i jih

y yk k NB ji k jih

)1()(

W2

)()( )(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4176

Kohonenrsquos self -organizing algorithm

Self-organizing map - Wikipedia the free

encyclopedia

Neural networks (1996)

Neural networks (2002)

Neurocomputing (2011)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4276

SOM matlab toolbox

SOM Toolbox

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4376

Neural Networks 15(3) 337-347 (2002)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4476

Neurocomputing 74(12-13) 2228-2240

(2011)

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4576

Minimal wiring and maximal fitting

criteria

Maximal fitting to a center Minimal distance to the nearest center

Wiring Criterion

t i

ii t t 2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W

2

)()(

)(

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4676

Minimal wiring and maximal fitting

criteria

t iii t t

2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W2

)()(

)(

WEF(y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4776

Set β sufficiently small Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4876

Minimize the weight distance

k

k NBh

k hk

t

k

i

k k

k NBh

k h

t

ik k

ySolve

y yt t v

dy

W E d

y yt t v

0)()][(][

0)(

W][][E

)(

)(

2

k

2

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4976

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5076

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5176

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5276

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5376

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5476

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5576

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5676

Microarray expressions of yeast genes

Pat_Brown_Lab_Home_Page

Exploring the Metabolic and Genetic Control of Gene

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 5: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 576

Matlab stats toolbox

Add directory toolboxstats to matlab path

Load data_9mat

Kmeans clustering

[cidx ctrs] = kmeans(XX10)

plot(XX(1)XX(2))

hold onplot(ctrs(1)ctrs(2)ro)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 676

Matlab stats toolbox

[cidx ctrs] = kmeans(XX10)

Ten clusters

Data matrix Nxd

Center locations Kxd

Membership Nx1

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 776

Download

Internet

Example Add directory cluster to matlab path

Load data_9mat

Kmeans clustering [cidx ctrs] = kmeans(XX10)

plot(XX(1)XX(2))hold onplot(ctrs(1)ctrs(2)ro)

My_Kmeans Parallel codes

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 876

[cidx ctrs] = kmeans(XX10)

Ten clusters

Data matrix dxN

Center locations dxK

Membership 1xN

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 976

K-means

An iterative approach

Non-overlapping partition

Means of partitioned groups

Non-overlapping partition Each data x

is partitioned to its nearest center

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1076

Stepwise procedure

1 Input unsupervised high dimensional data

and the number K of clusters

2 Randomly initialize K centers near the center

of given data3 Partition all data to K clusters using current

means

4 Update the center position of each group5 If halting condition holds exit

6 Go to step 3

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1176

Exercise

Draw a while-loop flow chart to illustrate

how the K-means algorithm partitions

high-dimensional data to K clusters

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1276

Exercise

Apply the K-means method to clustering

analysis of data in data_25mat

Calculate the mean distance between

each data point and its correspondent

center

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1376

Annealed K-means

Use overlapping memberships

Overcome the local minimum problem

Relaxation under an annealing process

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1476

Overlapping memberships

Each x has an overlapping membership

to each cluster

The membership to the ith cluster is

proportional to exp(-βdi) where di denote

the distance between x and the ith center

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1576

Annealed K-means

1 Input unsupervised high dimensional dataand the number K of clusters

2 Set β to sufficiently small positive numberrandomly initialize K centers near the center

of given data3 Determine overlapping memberships of all

data to K clusters

4 Update the center position of each group

5 Increase β carefully 6 If halting condition holds exit

7 Go to step 3

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1676

Exercise

Draw a while-loop flow chart to illustrate

how the annealed K-means algorithm

partition high-dimensional data to K

clusters

Implement the flow chart using Matlab

codes

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1776

Set β sufficiently small

Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1876

Minimize the weight

distance

t

i

t

i

i

t

i

i

i

t

iii

t v

t t v

t t v

dy

dE

t t v

][

][][

0)][(][

0

][][E

i

2

x

y

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1976

k

k

k

k

k

k

k k

k k

d C

d C

v

d C vd v

)exp(1

1)exp(

1

)exp()exp(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2076

j

j

k

k d

d v

)exp(

)exp()(

d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2176

Overlapping membership

Sufficiently small β

vk equals 1K

Sufficiently large β

vk = 1 if dk = mini di

= 0 otherwise

All vk are determined by the winner-take-all

principle

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2276

Why annealing

Sufficiently large β leads to the K-means

algorithm

Can annealing improve the performance

How to measure the performance for

clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2376

center nearestthe][eachof distancetheupsums

10][][

][][

][][

EE

][][E

i

2

2

2

tot E

t vt if

t t v

t t v

t t v

i

t i

ii

i t

ii

i

i

t

iii

x

yx

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2476

Criterion of K-means

Minimal distance to

the nearest center

t i

ii t t 2][][E(y) yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2576

Cross distance

Refer to Math Programming

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2676

Self-organization

Clustering

Define neighboring relations of clusters

on a 2D lattice

Sorting high-dimensional data on a

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2776

Clustering by SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2876

SOM

Download Matlab Neural Networks

toolbox

Add directory nnet with sub-directories

to matlab path

Apply the SOM method to clustering

analysis of data in data_25mat

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2976

Load data

load data_25

plot(XX(1)XX(2)g)

hold on

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3076

SOM

net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100

Initialization

net = train(netXX)

plotsom(netiw11netlayers1distances)

Training

Display

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3176

net = newsom([0 02 0 01][5 5]gridtop)

5x5 lattice

2x2 matrix for two-dimensional inputs

dx2 matrix for d-dimensional inputs

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3276

net = train(netXX)

dxN data matrix

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3376

Surface construction

demo_fr2_somm

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3476

Surface reconstruction

P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]

net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)

hold onplotsom(netiw11netlayers1distances)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3576

Self-organization

Maximal fitting and minimal wiring

principle

Sort high dimensional data on a 2D

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3676

Neural optimization

K-means

Minimal distance to the nearest center

Kohonenrsquos self -organizing

Minimal wiring distance between

neighboring centers

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3776

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

C1 i=1j=1

C2 i=1j=M

C3 i=Mj=1

C4 i=Mj=M

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3876

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

S1 i=1j=2M-1

S2 i=2M-1j=M

S3 i=Mj=2M-1

S4 i=2M-1j=1

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3976

If (ij) not within S1-S4 and C1-C4

NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)

NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4076

Wiring Criterion

j M i jih

y yk k NB ji k jih

)1()(

W2

)()( )(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4176

Kohonenrsquos self -organizing algorithm

Self-organizing map - Wikipedia the free

encyclopedia

Neural networks (1996)

Neural networks (2002)

Neurocomputing (2011)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4276

SOM matlab toolbox

SOM Toolbox

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4376

Neural Networks 15(3) 337-347 (2002)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4476

Neurocomputing 74(12-13) 2228-2240

(2011)

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4576

Minimal wiring and maximal fitting

criteria

Maximal fitting to a center Minimal distance to the nearest center

Wiring Criterion

t i

ii t t 2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W

2

)()(

)(

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4676

Minimal wiring and maximal fitting

criteria

t iii t t

2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W2

)()(

)(

WEF(y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4776

Set β sufficiently small Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4876

Minimize the weight distance

k

k NBh

k hk

t

k

i

k k

k NBh

k h

t

ik k

ySolve

y yt t v

dy

W E d

y yt t v

0)()][(][

0)(

W][][E

)(

)(

2

k

2

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4976

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5076

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5176

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5276

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5376

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5476

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5576

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5676

Microarray expressions of yeast genes

Pat_Brown_Lab_Home_Page

Exploring the Metabolic and Genetic Control of Gene

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 6: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 676

Matlab stats toolbox

[cidx ctrs] = kmeans(XX10)

Ten clusters

Data matrix Nxd

Center locations Kxd

Membership Nx1

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 776

Download

Internet

Example Add directory cluster to matlab path

Load data_9mat

Kmeans clustering [cidx ctrs] = kmeans(XX10)

plot(XX(1)XX(2))hold onplot(ctrs(1)ctrs(2)ro)

My_Kmeans Parallel codes

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 876

[cidx ctrs] = kmeans(XX10)

Ten clusters

Data matrix dxN

Center locations dxK

Membership 1xN

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 976

K-means

An iterative approach

Non-overlapping partition

Means of partitioned groups

Non-overlapping partition Each data x

is partitioned to its nearest center

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1076

Stepwise procedure

1 Input unsupervised high dimensional data

and the number K of clusters

2 Randomly initialize K centers near the center

of given data3 Partition all data to K clusters using current

means

4 Update the center position of each group5 If halting condition holds exit

6 Go to step 3

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1176

Exercise

Draw a while-loop flow chart to illustrate

how the K-means algorithm partitions

high-dimensional data to K clusters

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1276

Exercise

Apply the K-means method to clustering

analysis of data in data_25mat

Calculate the mean distance between

each data point and its correspondent

center

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1376

Annealed K-means

Use overlapping memberships

Overcome the local minimum problem

Relaxation under an annealing process

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1476

Overlapping memberships

Each x has an overlapping membership

to each cluster

The membership to the ith cluster is

proportional to exp(-βdi) where di denote

the distance between x and the ith center

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1576

Annealed K-means

1 Input unsupervised high dimensional dataand the number K of clusters

2 Set β to sufficiently small positive numberrandomly initialize K centers near the center

of given data3 Determine overlapping memberships of all

data to K clusters

4 Update the center position of each group

5 Increase β carefully 6 If halting condition holds exit

7 Go to step 3

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1676

Exercise

Draw a while-loop flow chart to illustrate

how the annealed K-means algorithm

partition high-dimensional data to K

clusters

Implement the flow chart using Matlab

codes

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1776

Set β sufficiently small

Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1876

Minimize the weight

distance

t

i

t

i

i

t

i

i

i

t

iii

t v

t t v

t t v

dy

dE

t t v

][

][][

0)][(][

0

][][E

i

2

x

y

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1976

k

k

k

k

k

k

k k

k k

d C

d C

v

d C vd v

)exp(1

1)exp(

1

)exp()exp(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2076

j

j

k

k d

d v

)exp(

)exp()(

d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2176

Overlapping membership

Sufficiently small β

vk equals 1K

Sufficiently large β

vk = 1 if dk = mini di

= 0 otherwise

All vk are determined by the winner-take-all

principle

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2276

Why annealing

Sufficiently large β leads to the K-means

algorithm

Can annealing improve the performance

How to measure the performance for

clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2376

center nearestthe][eachof distancetheupsums

10][][

][][

][][

EE

][][E

i

2

2

2

tot E

t vt if

t t v

t t v

t t v

i

t i

ii

i t

ii

i

i

t

iii

x

yx

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2476

Criterion of K-means

Minimal distance to

the nearest center

t i

ii t t 2][][E(y) yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2576

Cross distance

Refer to Math Programming

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2676

Self-organization

Clustering

Define neighboring relations of clusters

on a 2D lattice

Sorting high-dimensional data on a

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2776

Clustering by SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2876

SOM

Download Matlab Neural Networks

toolbox

Add directory nnet with sub-directories

to matlab path

Apply the SOM method to clustering

analysis of data in data_25mat

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2976

Load data

load data_25

plot(XX(1)XX(2)g)

hold on

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3076

SOM

net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100

Initialization

net = train(netXX)

plotsom(netiw11netlayers1distances)

Training

Display

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3176

net = newsom([0 02 0 01][5 5]gridtop)

5x5 lattice

2x2 matrix for two-dimensional inputs

dx2 matrix for d-dimensional inputs

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3276

net = train(netXX)

dxN data matrix

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3376

Surface construction

demo_fr2_somm

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3476

Surface reconstruction

P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]

net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)

hold onplotsom(netiw11netlayers1distances)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3576

Self-organization

Maximal fitting and minimal wiring

principle

Sort high dimensional data on a 2D

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3676

Neural optimization

K-means

Minimal distance to the nearest center

Kohonenrsquos self -organizing

Minimal wiring distance between

neighboring centers

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3776

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

C1 i=1j=1

C2 i=1j=M

C3 i=Mj=1

C4 i=Mj=M

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3876

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

S1 i=1j=2M-1

S2 i=2M-1j=M

S3 i=Mj=2M-1

S4 i=2M-1j=1

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3976

If (ij) not within S1-S4 and C1-C4

NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)

NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4076

Wiring Criterion

j M i jih

y yk k NB ji k jih

)1()(

W2

)()( )(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4176

Kohonenrsquos self -organizing algorithm

Self-organizing map - Wikipedia the free

encyclopedia

Neural networks (1996)

Neural networks (2002)

Neurocomputing (2011)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4276

SOM matlab toolbox

SOM Toolbox

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4376

Neural Networks 15(3) 337-347 (2002)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4476

Neurocomputing 74(12-13) 2228-2240

(2011)

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4576

Minimal wiring and maximal fitting

criteria

Maximal fitting to a center Minimal distance to the nearest center

Wiring Criterion

t i

ii t t 2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W

2

)()(

)(

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4676

Minimal wiring and maximal fitting

criteria

t iii t t

2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W2

)()(

)(

WEF(y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4776

Set β sufficiently small Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4876

Minimize the weight distance

k

k NBh

k hk

t

k

i

k k

k NBh

k h

t

ik k

ySolve

y yt t v

dy

W E d

y yt t v

0)()][(][

0)(

W][][E

)(

)(

2

k

2

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4976

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5076

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5176

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5276

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5376

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5476

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5576

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5676

Microarray expressions of yeast genes

Pat_Brown_Lab_Home_Page

Exploring the Metabolic and Genetic Control of Gene

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 7: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 776

Download

Internet

Example Add directory cluster to matlab path

Load data_9mat

Kmeans clustering [cidx ctrs] = kmeans(XX10)

plot(XX(1)XX(2))hold onplot(ctrs(1)ctrs(2)ro)

My_Kmeans Parallel codes

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 876

[cidx ctrs] = kmeans(XX10)

Ten clusters

Data matrix dxN

Center locations dxK

Membership 1xN

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 976

K-means

An iterative approach

Non-overlapping partition

Means of partitioned groups

Non-overlapping partition Each data x

is partitioned to its nearest center

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1076

Stepwise procedure

1 Input unsupervised high dimensional data

and the number K of clusters

2 Randomly initialize K centers near the center

of given data3 Partition all data to K clusters using current

means

4 Update the center position of each group5 If halting condition holds exit

6 Go to step 3

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1176

Exercise

Draw a while-loop flow chart to illustrate

how the K-means algorithm partitions

high-dimensional data to K clusters

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1276

Exercise

Apply the K-means method to clustering

analysis of data in data_25mat

Calculate the mean distance between

each data point and its correspondent

center

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1376

Annealed K-means

Use overlapping memberships

Overcome the local minimum problem

Relaxation under an annealing process

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1476

Overlapping memberships

Each x has an overlapping membership

to each cluster

The membership to the ith cluster is

proportional to exp(-βdi) where di denote

the distance between x and the ith center

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1576

Annealed K-means

1 Input unsupervised high dimensional dataand the number K of clusters

2 Set β to sufficiently small positive numberrandomly initialize K centers near the center

of given data3 Determine overlapping memberships of all

data to K clusters

4 Update the center position of each group

5 Increase β carefully 6 If halting condition holds exit

7 Go to step 3

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1676

Exercise

Draw a while-loop flow chart to illustrate

how the annealed K-means algorithm

partition high-dimensional data to K

clusters

Implement the flow chart using Matlab

codes

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1776

Set β sufficiently small

Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1876

Minimize the weight

distance

t

i

t

i

i

t

i

i

i

t

iii

t v

t t v

t t v

dy

dE

t t v

][

][][

0)][(][

0

][][E

i

2

x

y

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1976

k

k

k

k

k

k

k k

k k

d C

d C

v

d C vd v

)exp(1

1)exp(

1

)exp()exp(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2076

j

j

k

k d

d v

)exp(

)exp()(

d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2176

Overlapping membership

Sufficiently small β

vk equals 1K

Sufficiently large β

vk = 1 if dk = mini di

= 0 otherwise

All vk are determined by the winner-take-all

principle

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2276

Why annealing

Sufficiently large β leads to the K-means

algorithm

Can annealing improve the performance

How to measure the performance for

clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2376

center nearestthe][eachof distancetheupsums

10][][

][][

][][

EE

][][E

i

2

2

2

tot E

t vt if

t t v

t t v

t t v

i

t i

ii

i t

ii

i

i

t

iii

x

yx

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2476

Criterion of K-means

Minimal distance to

the nearest center

t i

ii t t 2][][E(y) yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2576

Cross distance

Refer to Math Programming

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2676

Self-organization

Clustering

Define neighboring relations of clusters

on a 2D lattice

Sorting high-dimensional data on a

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2776

Clustering by SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2876

SOM

Download Matlab Neural Networks

toolbox

Add directory nnet with sub-directories

to matlab path

Apply the SOM method to clustering

analysis of data in data_25mat

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2976

Load data

load data_25

plot(XX(1)XX(2)g)

hold on

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3076

SOM

net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100

Initialization

net = train(netXX)

plotsom(netiw11netlayers1distances)

Training

Display

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3176

net = newsom([0 02 0 01][5 5]gridtop)

5x5 lattice

2x2 matrix for two-dimensional inputs

dx2 matrix for d-dimensional inputs

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3276

net = train(netXX)

dxN data matrix

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3376

Surface construction

demo_fr2_somm

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3476

Surface reconstruction

P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]

net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)

hold onplotsom(netiw11netlayers1distances)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3576

Self-organization

Maximal fitting and minimal wiring

principle

Sort high dimensional data on a 2D

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3676

Neural optimization

K-means

Minimal distance to the nearest center

Kohonenrsquos self -organizing

Minimal wiring distance between

neighboring centers

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3776

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

C1 i=1j=1

C2 i=1j=M

C3 i=Mj=1

C4 i=Mj=M

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3876

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

S1 i=1j=2M-1

S2 i=2M-1j=M

S3 i=Mj=2M-1

S4 i=2M-1j=1

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3976

If (ij) not within S1-S4 and C1-C4

NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)

NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4076

Wiring Criterion

j M i jih

y yk k NB ji k jih

)1()(

W2

)()( )(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4176

Kohonenrsquos self -organizing algorithm

Self-organizing map - Wikipedia the free

encyclopedia

Neural networks (1996)

Neural networks (2002)

Neurocomputing (2011)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4276

SOM matlab toolbox

SOM Toolbox

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4376

Neural Networks 15(3) 337-347 (2002)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4476

Neurocomputing 74(12-13) 2228-2240

(2011)

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4576

Minimal wiring and maximal fitting

criteria

Maximal fitting to a center Minimal distance to the nearest center

Wiring Criterion

t i

ii t t 2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W

2

)()(

)(

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4676

Minimal wiring and maximal fitting

criteria

t iii t t

2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W2

)()(

)(

WEF(y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4776

Set β sufficiently small Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4876

Minimize the weight distance

k

k NBh

k hk

t

k

i

k k

k NBh

k h

t

ik k

ySolve

y yt t v

dy

W E d

y yt t v

0)()][(][

0)(

W][][E

)(

)(

2

k

2

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4976

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5076

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5176

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5276

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5376

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5476

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5576

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5676

Microarray expressions of yeast genes

Pat_Brown_Lab_Home_Page

Exploring the Metabolic and Genetic Control of Gene

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 8: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 876

[cidx ctrs] = kmeans(XX10)

Ten clusters

Data matrix dxN

Center locations dxK

Membership 1xN

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 976

K-means

An iterative approach

Non-overlapping partition

Means of partitioned groups

Non-overlapping partition Each data x

is partitioned to its nearest center

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1076

Stepwise procedure

1 Input unsupervised high dimensional data

and the number K of clusters

2 Randomly initialize K centers near the center

of given data3 Partition all data to K clusters using current

means

4 Update the center position of each group5 If halting condition holds exit

6 Go to step 3

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1176

Exercise

Draw a while-loop flow chart to illustrate

how the K-means algorithm partitions

high-dimensional data to K clusters

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1276

Exercise

Apply the K-means method to clustering

analysis of data in data_25mat

Calculate the mean distance between

each data point and its correspondent

center

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1376

Annealed K-means

Use overlapping memberships

Overcome the local minimum problem

Relaxation under an annealing process

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1476

Overlapping memberships

Each x has an overlapping membership

to each cluster

The membership to the ith cluster is

proportional to exp(-βdi) where di denote

the distance between x and the ith center

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1576

Annealed K-means

1 Input unsupervised high dimensional dataand the number K of clusters

2 Set β to sufficiently small positive numberrandomly initialize K centers near the center

of given data3 Determine overlapping memberships of all

data to K clusters

4 Update the center position of each group

5 Increase β carefully 6 If halting condition holds exit

7 Go to step 3

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1676

Exercise

Draw a while-loop flow chart to illustrate

how the annealed K-means algorithm

partition high-dimensional data to K

clusters

Implement the flow chart using Matlab

codes

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1776

Set β sufficiently small

Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1876

Minimize the weight

distance

t

i

t

i

i

t

i

i

i

t

iii

t v

t t v

t t v

dy

dE

t t v

][

][][

0)][(][

0

][][E

i

2

x

y

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1976

k

k

k

k

k

k

k k

k k

d C

d C

v

d C vd v

)exp(1

1)exp(

1

)exp()exp(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2076

j

j

k

k d

d v

)exp(

)exp()(

d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2176

Overlapping membership

Sufficiently small β

vk equals 1K

Sufficiently large β

vk = 1 if dk = mini di

= 0 otherwise

All vk are determined by the winner-take-all

principle

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2276

Why annealing

Sufficiently large β leads to the K-means

algorithm

Can annealing improve the performance

How to measure the performance for

clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2376

center nearestthe][eachof distancetheupsums

10][][

][][

][][

EE

][][E

i

2

2

2

tot E

t vt if

t t v

t t v

t t v

i

t i

ii

i t

ii

i

i

t

iii

x

yx

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2476

Criterion of K-means

Minimal distance to

the nearest center

t i

ii t t 2][][E(y) yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2576

Cross distance

Refer to Math Programming

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2676

Self-organization

Clustering

Define neighboring relations of clusters

on a 2D lattice

Sorting high-dimensional data on a

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2776

Clustering by SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2876

SOM

Download Matlab Neural Networks

toolbox

Add directory nnet with sub-directories

to matlab path

Apply the SOM method to clustering

analysis of data in data_25mat

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2976

Load data

load data_25

plot(XX(1)XX(2)g)

hold on

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3076

SOM

net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100

Initialization

net = train(netXX)

plotsom(netiw11netlayers1distances)

Training

Display

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3176

net = newsom([0 02 0 01][5 5]gridtop)

5x5 lattice

2x2 matrix for two-dimensional inputs

dx2 matrix for d-dimensional inputs

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3276

net = train(netXX)

dxN data matrix

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3376

Surface construction

demo_fr2_somm

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3476

Surface reconstruction

P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]

net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)

hold onplotsom(netiw11netlayers1distances)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3576

Self-organization

Maximal fitting and minimal wiring

principle

Sort high dimensional data on a 2D

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3676

Neural optimization

K-means

Minimal distance to the nearest center

Kohonenrsquos self -organizing

Minimal wiring distance between

neighboring centers

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3776

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

C1 i=1j=1

C2 i=1j=M

C3 i=Mj=1

C4 i=Mj=M

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3876

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

S1 i=1j=2M-1

S2 i=2M-1j=M

S3 i=Mj=2M-1

S4 i=2M-1j=1

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3976

If (ij) not within S1-S4 and C1-C4

NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)

NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4076

Wiring Criterion

j M i jih

y yk k NB ji k jih

)1()(

W2

)()( )(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4176

Kohonenrsquos self -organizing algorithm

Self-organizing map - Wikipedia the free

encyclopedia

Neural networks (1996)

Neural networks (2002)

Neurocomputing (2011)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4276

SOM matlab toolbox

SOM Toolbox

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4376

Neural Networks 15(3) 337-347 (2002)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4476

Neurocomputing 74(12-13) 2228-2240

(2011)

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4576

Minimal wiring and maximal fitting

criteria

Maximal fitting to a center Minimal distance to the nearest center

Wiring Criterion

t i

ii t t 2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W

2

)()(

)(

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4676

Minimal wiring and maximal fitting

criteria

t iii t t

2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W2

)()(

)(

WEF(y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4776

Set β sufficiently small Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4876

Minimize the weight distance

k

k NBh

k hk

t

k

i

k k

k NBh

k h

t

ik k

ySolve

y yt t v

dy

W E d

y yt t v

0)()][(][

0)(

W][][E

)(

)(

2

k

2

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4976

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5076

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5176

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5276

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5376

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5476

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5576

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5676

Microarray expressions of yeast genes

Pat_Brown_Lab_Home_Page

Exploring the Metabolic and Genetic Control of Gene

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 9: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 976

K-means

An iterative approach

Non-overlapping partition

Means of partitioned groups

Non-overlapping partition Each data x

is partitioned to its nearest center

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1076

Stepwise procedure

1 Input unsupervised high dimensional data

and the number K of clusters

2 Randomly initialize K centers near the center

of given data3 Partition all data to K clusters using current

means

4 Update the center position of each group5 If halting condition holds exit

6 Go to step 3

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1176

Exercise

Draw a while-loop flow chart to illustrate

how the K-means algorithm partitions

high-dimensional data to K clusters

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1276

Exercise

Apply the K-means method to clustering

analysis of data in data_25mat

Calculate the mean distance between

each data point and its correspondent

center

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1376

Annealed K-means

Use overlapping memberships

Overcome the local minimum problem

Relaxation under an annealing process

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1476

Overlapping memberships

Each x has an overlapping membership

to each cluster

The membership to the ith cluster is

proportional to exp(-βdi) where di denote

the distance between x and the ith center

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1576

Annealed K-means

1 Input unsupervised high dimensional dataand the number K of clusters

2 Set β to sufficiently small positive numberrandomly initialize K centers near the center

of given data3 Determine overlapping memberships of all

data to K clusters

4 Update the center position of each group

5 Increase β carefully 6 If halting condition holds exit

7 Go to step 3

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1676

Exercise

Draw a while-loop flow chart to illustrate

how the annealed K-means algorithm

partition high-dimensional data to K

clusters

Implement the flow chart using Matlab

codes

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1776

Set β sufficiently small

Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1876

Minimize the weight

distance

t

i

t

i

i

t

i

i

i

t

iii

t v

t t v

t t v

dy

dE

t t v

][

][][

0)][(][

0

][][E

i

2

x

y

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1976

k

k

k

k

k

k

k k

k k

d C

d C

v

d C vd v

)exp(1

1)exp(

1

)exp()exp(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2076

j

j

k

k d

d v

)exp(

)exp()(

d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2176

Overlapping membership

Sufficiently small β

vk equals 1K

Sufficiently large β

vk = 1 if dk = mini di

= 0 otherwise

All vk are determined by the winner-take-all

principle

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2276

Why annealing

Sufficiently large β leads to the K-means

algorithm

Can annealing improve the performance

How to measure the performance for

clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2376

center nearestthe][eachof distancetheupsums

10][][

][][

][][

EE

][][E

i

2

2

2

tot E

t vt if

t t v

t t v

t t v

i

t i

ii

i t

ii

i

i

t

iii

x

yx

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2476

Criterion of K-means

Minimal distance to

the nearest center

t i

ii t t 2][][E(y) yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2576

Cross distance

Refer to Math Programming

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2676

Self-organization

Clustering

Define neighboring relations of clusters

on a 2D lattice

Sorting high-dimensional data on a

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2776

Clustering by SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2876

SOM

Download Matlab Neural Networks

toolbox

Add directory nnet with sub-directories

to matlab path

Apply the SOM method to clustering

analysis of data in data_25mat

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2976

Load data

load data_25

plot(XX(1)XX(2)g)

hold on

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3076

SOM

net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100

Initialization

net = train(netXX)

plotsom(netiw11netlayers1distances)

Training

Display

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3176

net = newsom([0 02 0 01][5 5]gridtop)

5x5 lattice

2x2 matrix for two-dimensional inputs

dx2 matrix for d-dimensional inputs

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3276

net = train(netXX)

dxN data matrix

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3376

Surface construction

demo_fr2_somm

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3476

Surface reconstruction

P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]

net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)

hold onplotsom(netiw11netlayers1distances)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3576

Self-organization

Maximal fitting and minimal wiring

principle

Sort high dimensional data on a 2D

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3676

Neural optimization

K-means

Minimal distance to the nearest center

Kohonenrsquos self -organizing

Minimal wiring distance between

neighboring centers

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3776

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

C1 i=1j=1

C2 i=1j=M

C3 i=Mj=1

C4 i=Mj=M

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3876

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

S1 i=1j=2M-1

S2 i=2M-1j=M

S3 i=Mj=2M-1

S4 i=2M-1j=1

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3976

If (ij) not within S1-S4 and C1-C4

NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)

NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4076

Wiring Criterion

j M i jih

y yk k NB ji k jih

)1()(

W2

)()( )(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4176

Kohonenrsquos self -organizing algorithm

Self-organizing map - Wikipedia the free

encyclopedia

Neural networks (1996)

Neural networks (2002)

Neurocomputing (2011)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4276

SOM matlab toolbox

SOM Toolbox

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4376

Neural Networks 15(3) 337-347 (2002)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4476

Neurocomputing 74(12-13) 2228-2240

(2011)

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4576

Minimal wiring and maximal fitting

criteria

Maximal fitting to a center Minimal distance to the nearest center

Wiring Criterion

t i

ii t t 2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W

2

)()(

)(

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4676

Minimal wiring and maximal fitting

criteria

t iii t t

2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W2

)()(

)(

WEF(y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4776

Set β sufficiently small Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4876

Minimize the weight distance

k

k NBh

k hk

t

k

i

k k

k NBh

k h

t

ik k

ySolve

y yt t v

dy

W E d

y yt t v

0)()][(][

0)(

W][][E

)(

)(

2

k

2

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4976

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5076

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5176

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5276

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5376

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5476

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5576

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5676

Microarray expressions of yeast genes

Pat_Brown_Lab_Home_Page

Exploring the Metabolic and Genetic Control of Gene

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 10: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1076

Stepwise procedure

1 Input unsupervised high dimensional data

and the number K of clusters

2 Randomly initialize K centers near the center

of given data3 Partition all data to K clusters using current

means

4 Update the center position of each group5 If halting condition holds exit

6 Go to step 3

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1176

Exercise

Draw a while-loop flow chart to illustrate

how the K-means algorithm partitions

high-dimensional data to K clusters

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1276

Exercise

Apply the K-means method to clustering

analysis of data in data_25mat

Calculate the mean distance between

each data point and its correspondent

center

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1376

Annealed K-means

Use overlapping memberships

Overcome the local minimum problem

Relaxation under an annealing process

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1476

Overlapping memberships

Each x has an overlapping membership

to each cluster

The membership to the ith cluster is

proportional to exp(-βdi) where di denote

the distance between x and the ith center

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1576

Annealed K-means

1 Input unsupervised high dimensional dataand the number K of clusters

2 Set β to sufficiently small positive numberrandomly initialize K centers near the center

of given data3 Determine overlapping memberships of all

data to K clusters

4 Update the center position of each group

5 Increase β carefully 6 If halting condition holds exit

7 Go to step 3

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1676

Exercise

Draw a while-loop flow chart to illustrate

how the annealed K-means algorithm

partition high-dimensional data to K

clusters

Implement the flow chart using Matlab

codes

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1776

Set β sufficiently small

Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1876

Minimize the weight

distance

t

i

t

i

i

t

i

i

i

t

iii

t v

t t v

t t v

dy

dE

t t v

][

][][

0)][(][

0

][][E

i

2

x

y

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1976

k

k

k

k

k

k

k k

k k

d C

d C

v

d C vd v

)exp(1

1)exp(

1

)exp()exp(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2076

j

j

k

k d

d v

)exp(

)exp()(

d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2176

Overlapping membership

Sufficiently small β

vk equals 1K

Sufficiently large β

vk = 1 if dk = mini di

= 0 otherwise

All vk are determined by the winner-take-all

principle

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2276

Why annealing

Sufficiently large β leads to the K-means

algorithm

Can annealing improve the performance

How to measure the performance for

clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2376

center nearestthe][eachof distancetheupsums

10][][

][][

][][

EE

][][E

i

2

2

2

tot E

t vt if

t t v

t t v

t t v

i

t i

ii

i t

ii

i

i

t

iii

x

yx

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2476

Criterion of K-means

Minimal distance to

the nearest center

t i

ii t t 2][][E(y) yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2576

Cross distance

Refer to Math Programming

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2676

Self-organization

Clustering

Define neighboring relations of clusters

on a 2D lattice

Sorting high-dimensional data on a

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2776

Clustering by SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2876

SOM

Download Matlab Neural Networks

toolbox

Add directory nnet with sub-directories

to matlab path

Apply the SOM method to clustering

analysis of data in data_25mat

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2976

Load data

load data_25

plot(XX(1)XX(2)g)

hold on

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3076

SOM

net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100

Initialization

net = train(netXX)

plotsom(netiw11netlayers1distances)

Training

Display

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3176

net = newsom([0 02 0 01][5 5]gridtop)

5x5 lattice

2x2 matrix for two-dimensional inputs

dx2 matrix for d-dimensional inputs

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3276

net = train(netXX)

dxN data matrix

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3376

Surface construction

demo_fr2_somm

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3476

Surface reconstruction

P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]

net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)

hold onplotsom(netiw11netlayers1distances)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3576

Self-organization

Maximal fitting and minimal wiring

principle

Sort high dimensional data on a 2D

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3676

Neural optimization

K-means

Minimal distance to the nearest center

Kohonenrsquos self -organizing

Minimal wiring distance between

neighboring centers

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3776

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

C1 i=1j=1

C2 i=1j=M

C3 i=Mj=1

C4 i=Mj=M

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3876

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

S1 i=1j=2M-1

S2 i=2M-1j=M

S3 i=Mj=2M-1

S4 i=2M-1j=1

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3976

If (ij) not within S1-S4 and C1-C4

NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)

NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4076

Wiring Criterion

j M i jih

y yk k NB ji k jih

)1()(

W2

)()( )(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4176

Kohonenrsquos self -organizing algorithm

Self-organizing map - Wikipedia the free

encyclopedia

Neural networks (1996)

Neural networks (2002)

Neurocomputing (2011)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4276

SOM matlab toolbox

SOM Toolbox

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4376

Neural Networks 15(3) 337-347 (2002)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4476

Neurocomputing 74(12-13) 2228-2240

(2011)

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4576

Minimal wiring and maximal fitting

criteria

Maximal fitting to a center Minimal distance to the nearest center

Wiring Criterion

t i

ii t t 2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W

2

)()(

)(

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4676

Minimal wiring and maximal fitting

criteria

t iii t t

2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W2

)()(

)(

WEF(y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4776

Set β sufficiently small Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4876

Minimize the weight distance

k

k NBh

k hk

t

k

i

k k

k NBh

k h

t

ik k

ySolve

y yt t v

dy

W E d

y yt t v

0)()][(][

0)(

W][][E

)(

)(

2

k

2

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4976

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5076

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5176

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5276

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5376

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5476

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5576

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5676

Microarray expressions of yeast genes

Pat_Brown_Lab_Home_Page

Exploring the Metabolic and Genetic Control of Gene

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 11: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1176

Exercise

Draw a while-loop flow chart to illustrate

how the K-means algorithm partitions

high-dimensional data to K clusters

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1276

Exercise

Apply the K-means method to clustering

analysis of data in data_25mat

Calculate the mean distance between

each data point and its correspondent

center

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1376

Annealed K-means

Use overlapping memberships

Overcome the local minimum problem

Relaxation under an annealing process

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1476

Overlapping memberships

Each x has an overlapping membership

to each cluster

The membership to the ith cluster is

proportional to exp(-βdi) where di denote

the distance between x and the ith center

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1576

Annealed K-means

1 Input unsupervised high dimensional dataand the number K of clusters

2 Set β to sufficiently small positive numberrandomly initialize K centers near the center

of given data3 Determine overlapping memberships of all

data to K clusters

4 Update the center position of each group

5 Increase β carefully 6 If halting condition holds exit

7 Go to step 3

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1676

Exercise

Draw a while-loop flow chart to illustrate

how the annealed K-means algorithm

partition high-dimensional data to K

clusters

Implement the flow chart using Matlab

codes

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1776

Set β sufficiently small

Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1876

Minimize the weight

distance

t

i

t

i

i

t

i

i

i

t

iii

t v

t t v

t t v

dy

dE

t t v

][

][][

0)][(][

0

][][E

i

2

x

y

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1976

k

k

k

k

k

k

k k

k k

d C

d C

v

d C vd v

)exp(1

1)exp(

1

)exp()exp(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2076

j

j

k

k d

d v

)exp(

)exp()(

d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2176

Overlapping membership

Sufficiently small β

vk equals 1K

Sufficiently large β

vk = 1 if dk = mini di

= 0 otherwise

All vk are determined by the winner-take-all

principle

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2276

Why annealing

Sufficiently large β leads to the K-means

algorithm

Can annealing improve the performance

How to measure the performance for

clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2376

center nearestthe][eachof distancetheupsums

10][][

][][

][][

EE

][][E

i

2

2

2

tot E

t vt if

t t v

t t v

t t v

i

t i

ii

i t

ii

i

i

t

iii

x

yx

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2476

Criterion of K-means

Minimal distance to

the nearest center

t i

ii t t 2][][E(y) yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2576

Cross distance

Refer to Math Programming

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2676

Self-organization

Clustering

Define neighboring relations of clusters

on a 2D lattice

Sorting high-dimensional data on a

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2776

Clustering by SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2876

SOM

Download Matlab Neural Networks

toolbox

Add directory nnet with sub-directories

to matlab path

Apply the SOM method to clustering

analysis of data in data_25mat

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2976

Load data

load data_25

plot(XX(1)XX(2)g)

hold on

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3076

SOM

net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100

Initialization

net = train(netXX)

plotsom(netiw11netlayers1distances)

Training

Display

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3176

net = newsom([0 02 0 01][5 5]gridtop)

5x5 lattice

2x2 matrix for two-dimensional inputs

dx2 matrix for d-dimensional inputs

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3276

net = train(netXX)

dxN data matrix

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3376

Surface construction

demo_fr2_somm

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3476

Surface reconstruction

P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]

net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)

hold onplotsom(netiw11netlayers1distances)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3576

Self-organization

Maximal fitting and minimal wiring

principle

Sort high dimensional data on a 2D

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3676

Neural optimization

K-means

Minimal distance to the nearest center

Kohonenrsquos self -organizing

Minimal wiring distance between

neighboring centers

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3776

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

C1 i=1j=1

C2 i=1j=M

C3 i=Mj=1

C4 i=Mj=M

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3876

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

S1 i=1j=2M-1

S2 i=2M-1j=M

S3 i=Mj=2M-1

S4 i=2M-1j=1

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3976

If (ij) not within S1-S4 and C1-C4

NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)

NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4076

Wiring Criterion

j M i jih

y yk k NB ji k jih

)1()(

W2

)()( )(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4176

Kohonenrsquos self -organizing algorithm

Self-organizing map - Wikipedia the free

encyclopedia

Neural networks (1996)

Neural networks (2002)

Neurocomputing (2011)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4276

SOM matlab toolbox

SOM Toolbox

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4376

Neural Networks 15(3) 337-347 (2002)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4476

Neurocomputing 74(12-13) 2228-2240

(2011)

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4576

Minimal wiring and maximal fitting

criteria

Maximal fitting to a center Minimal distance to the nearest center

Wiring Criterion

t i

ii t t 2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W

2

)()(

)(

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4676

Minimal wiring and maximal fitting

criteria

t iii t t

2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W2

)()(

)(

WEF(y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4776

Set β sufficiently small Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4876

Minimize the weight distance

k

k NBh

k hk

t

k

i

k k

k NBh

k h

t

ik k

ySolve

y yt t v

dy

W E d

y yt t v

0)()][(][

0)(

W][][E

)(

)(

2

k

2

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4976

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5076

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5176

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5276

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5376

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5476

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5576

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5676

Microarray expressions of yeast genes

Pat_Brown_Lab_Home_Page

Exploring the Metabolic and Genetic Control of Gene

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 12: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1276

Exercise

Apply the K-means method to clustering

analysis of data in data_25mat

Calculate the mean distance between

each data point and its correspondent

center

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1376

Annealed K-means

Use overlapping memberships

Overcome the local minimum problem

Relaxation under an annealing process

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1476

Overlapping memberships

Each x has an overlapping membership

to each cluster

The membership to the ith cluster is

proportional to exp(-βdi) where di denote

the distance between x and the ith center

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1576

Annealed K-means

1 Input unsupervised high dimensional dataand the number K of clusters

2 Set β to sufficiently small positive numberrandomly initialize K centers near the center

of given data3 Determine overlapping memberships of all

data to K clusters

4 Update the center position of each group

5 Increase β carefully 6 If halting condition holds exit

7 Go to step 3

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1676

Exercise

Draw a while-loop flow chart to illustrate

how the annealed K-means algorithm

partition high-dimensional data to K

clusters

Implement the flow chart using Matlab

codes

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1776

Set β sufficiently small

Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1876

Minimize the weight

distance

t

i

t

i

i

t

i

i

i

t

iii

t v

t t v

t t v

dy

dE

t t v

][

][][

0)][(][

0

][][E

i

2

x

y

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1976

k

k

k

k

k

k

k k

k k

d C

d C

v

d C vd v

)exp(1

1)exp(

1

)exp()exp(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2076

j

j

k

k d

d v

)exp(

)exp()(

d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2176

Overlapping membership

Sufficiently small β

vk equals 1K

Sufficiently large β

vk = 1 if dk = mini di

= 0 otherwise

All vk are determined by the winner-take-all

principle

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2276

Why annealing

Sufficiently large β leads to the K-means

algorithm

Can annealing improve the performance

How to measure the performance for

clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2376

center nearestthe][eachof distancetheupsums

10][][

][][

][][

EE

][][E

i

2

2

2

tot E

t vt if

t t v

t t v

t t v

i

t i

ii

i t

ii

i

i

t

iii

x

yx

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2476

Criterion of K-means

Minimal distance to

the nearest center

t i

ii t t 2][][E(y) yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2576

Cross distance

Refer to Math Programming

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2676

Self-organization

Clustering

Define neighboring relations of clusters

on a 2D lattice

Sorting high-dimensional data on a

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2776

Clustering by SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2876

SOM

Download Matlab Neural Networks

toolbox

Add directory nnet with sub-directories

to matlab path

Apply the SOM method to clustering

analysis of data in data_25mat

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2976

Load data

load data_25

plot(XX(1)XX(2)g)

hold on

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3076

SOM

net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100

Initialization

net = train(netXX)

plotsom(netiw11netlayers1distances)

Training

Display

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3176

net = newsom([0 02 0 01][5 5]gridtop)

5x5 lattice

2x2 matrix for two-dimensional inputs

dx2 matrix for d-dimensional inputs

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3276

net = train(netXX)

dxN data matrix

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3376

Surface construction

demo_fr2_somm

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3476

Surface reconstruction

P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]

net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)

hold onplotsom(netiw11netlayers1distances)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3576

Self-organization

Maximal fitting and minimal wiring

principle

Sort high dimensional data on a 2D

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3676

Neural optimization

K-means

Minimal distance to the nearest center

Kohonenrsquos self -organizing

Minimal wiring distance between

neighboring centers

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3776

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

C1 i=1j=1

C2 i=1j=M

C3 i=Mj=1

C4 i=Mj=M

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3876

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

S1 i=1j=2M-1

S2 i=2M-1j=M

S3 i=Mj=2M-1

S4 i=2M-1j=1

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3976

If (ij) not within S1-S4 and C1-C4

NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)

NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4076

Wiring Criterion

j M i jih

y yk k NB ji k jih

)1()(

W2

)()( )(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4176

Kohonenrsquos self -organizing algorithm

Self-organizing map - Wikipedia the free

encyclopedia

Neural networks (1996)

Neural networks (2002)

Neurocomputing (2011)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4276

SOM matlab toolbox

SOM Toolbox

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4376

Neural Networks 15(3) 337-347 (2002)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4476

Neurocomputing 74(12-13) 2228-2240

(2011)

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4576

Minimal wiring and maximal fitting

criteria

Maximal fitting to a center Minimal distance to the nearest center

Wiring Criterion

t i

ii t t 2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W

2

)()(

)(

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4676

Minimal wiring and maximal fitting

criteria

t iii t t

2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W2

)()(

)(

WEF(y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4776

Set β sufficiently small Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4876

Minimize the weight distance

k

k NBh

k hk

t

k

i

k k

k NBh

k h

t

ik k

ySolve

y yt t v

dy

W E d

y yt t v

0)()][(][

0)(

W][][E

)(

)(

2

k

2

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4976

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5076

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5176

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5276

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5376

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5476

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5576

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5676

Microarray expressions of yeast genes

Pat_Brown_Lab_Home_Page

Exploring the Metabolic and Genetic Control of Gene

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 13: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1376

Annealed K-means

Use overlapping memberships

Overcome the local minimum problem

Relaxation under an annealing process

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1476

Overlapping memberships

Each x has an overlapping membership

to each cluster

The membership to the ith cluster is

proportional to exp(-βdi) where di denote

the distance between x and the ith center

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1576

Annealed K-means

1 Input unsupervised high dimensional dataand the number K of clusters

2 Set β to sufficiently small positive numberrandomly initialize K centers near the center

of given data3 Determine overlapping memberships of all

data to K clusters

4 Update the center position of each group

5 Increase β carefully 6 If halting condition holds exit

7 Go to step 3

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1676

Exercise

Draw a while-loop flow chart to illustrate

how the annealed K-means algorithm

partition high-dimensional data to K

clusters

Implement the flow chart using Matlab

codes

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1776

Set β sufficiently small

Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1876

Minimize the weight

distance

t

i

t

i

i

t

i

i

i

t

iii

t v

t t v

t t v

dy

dE

t t v

][

][][

0)][(][

0

][][E

i

2

x

y

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1976

k

k

k

k

k

k

k k

k k

d C

d C

v

d C vd v

)exp(1

1)exp(

1

)exp()exp(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2076

j

j

k

k d

d v

)exp(

)exp()(

d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2176

Overlapping membership

Sufficiently small β

vk equals 1K

Sufficiently large β

vk = 1 if dk = mini di

= 0 otherwise

All vk are determined by the winner-take-all

principle

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2276

Why annealing

Sufficiently large β leads to the K-means

algorithm

Can annealing improve the performance

How to measure the performance for

clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2376

center nearestthe][eachof distancetheupsums

10][][

][][

][][

EE

][][E

i

2

2

2

tot E

t vt if

t t v

t t v

t t v

i

t i

ii

i t

ii

i

i

t

iii

x

yx

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2476

Criterion of K-means

Minimal distance to

the nearest center

t i

ii t t 2][][E(y) yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2576

Cross distance

Refer to Math Programming

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2676

Self-organization

Clustering

Define neighboring relations of clusters

on a 2D lattice

Sorting high-dimensional data on a

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2776

Clustering by SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2876

SOM

Download Matlab Neural Networks

toolbox

Add directory nnet with sub-directories

to matlab path

Apply the SOM method to clustering

analysis of data in data_25mat

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2976

Load data

load data_25

plot(XX(1)XX(2)g)

hold on

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3076

SOM

net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100

Initialization

net = train(netXX)

plotsom(netiw11netlayers1distances)

Training

Display

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3176

net = newsom([0 02 0 01][5 5]gridtop)

5x5 lattice

2x2 matrix for two-dimensional inputs

dx2 matrix for d-dimensional inputs

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3276

net = train(netXX)

dxN data matrix

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3376

Surface construction

demo_fr2_somm

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3476

Surface reconstruction

P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]

net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)

hold onplotsom(netiw11netlayers1distances)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3576

Self-organization

Maximal fitting and minimal wiring

principle

Sort high dimensional data on a 2D

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3676

Neural optimization

K-means

Minimal distance to the nearest center

Kohonenrsquos self -organizing

Minimal wiring distance between

neighboring centers

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3776

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

C1 i=1j=1

C2 i=1j=M

C3 i=Mj=1

C4 i=Mj=M

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3876

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

S1 i=1j=2M-1

S2 i=2M-1j=M

S3 i=Mj=2M-1

S4 i=2M-1j=1

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3976

If (ij) not within S1-S4 and C1-C4

NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)

NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4076

Wiring Criterion

j M i jih

y yk k NB ji k jih

)1()(

W2

)()( )(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4176

Kohonenrsquos self -organizing algorithm

Self-organizing map - Wikipedia the free

encyclopedia

Neural networks (1996)

Neural networks (2002)

Neurocomputing (2011)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4276

SOM matlab toolbox

SOM Toolbox

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4376

Neural Networks 15(3) 337-347 (2002)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4476

Neurocomputing 74(12-13) 2228-2240

(2011)

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4576

Minimal wiring and maximal fitting

criteria

Maximal fitting to a center Minimal distance to the nearest center

Wiring Criterion

t i

ii t t 2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W

2

)()(

)(

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4676

Minimal wiring and maximal fitting

criteria

t iii t t

2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W2

)()(

)(

WEF(y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4776

Set β sufficiently small Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4876

Minimize the weight distance

k

k NBh

k hk

t

k

i

k k

k NBh

k h

t

ik k

ySolve

y yt t v

dy

W E d

y yt t v

0)()][(][

0)(

W][][E

)(

)(

2

k

2

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4976

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5076

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5176

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5276

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5376

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5476

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5576

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5676

Microarray expressions of yeast genes

Pat_Brown_Lab_Home_Page

Exploring the Metabolic and Genetic Control of Gene

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 14: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1476

Overlapping memberships

Each x has an overlapping membership

to each cluster

The membership to the ith cluster is

proportional to exp(-βdi) where di denote

the distance between x and the ith center

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1576

Annealed K-means

1 Input unsupervised high dimensional dataand the number K of clusters

2 Set β to sufficiently small positive numberrandomly initialize K centers near the center

of given data3 Determine overlapping memberships of all

data to K clusters

4 Update the center position of each group

5 Increase β carefully 6 If halting condition holds exit

7 Go to step 3

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1676

Exercise

Draw a while-loop flow chart to illustrate

how the annealed K-means algorithm

partition high-dimensional data to K

clusters

Implement the flow chart using Matlab

codes

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1776

Set β sufficiently small

Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1876

Minimize the weight

distance

t

i

t

i

i

t

i

i

i

t

iii

t v

t t v

t t v

dy

dE

t t v

][

][][

0)][(][

0

][][E

i

2

x

y

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1976

k

k

k

k

k

k

k k

k k

d C

d C

v

d C vd v

)exp(1

1)exp(

1

)exp()exp(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2076

j

j

k

k d

d v

)exp(

)exp()(

d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2176

Overlapping membership

Sufficiently small β

vk equals 1K

Sufficiently large β

vk = 1 if dk = mini di

= 0 otherwise

All vk are determined by the winner-take-all

principle

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2276

Why annealing

Sufficiently large β leads to the K-means

algorithm

Can annealing improve the performance

How to measure the performance for

clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2376

center nearestthe][eachof distancetheupsums

10][][

][][

][][

EE

][][E

i

2

2

2

tot E

t vt if

t t v

t t v

t t v

i

t i

ii

i t

ii

i

i

t

iii

x

yx

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2476

Criterion of K-means

Minimal distance to

the nearest center

t i

ii t t 2][][E(y) yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2576

Cross distance

Refer to Math Programming

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2676

Self-organization

Clustering

Define neighboring relations of clusters

on a 2D lattice

Sorting high-dimensional data on a

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2776

Clustering by SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2876

SOM

Download Matlab Neural Networks

toolbox

Add directory nnet with sub-directories

to matlab path

Apply the SOM method to clustering

analysis of data in data_25mat

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2976

Load data

load data_25

plot(XX(1)XX(2)g)

hold on

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3076

SOM

net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100

Initialization

net = train(netXX)

plotsom(netiw11netlayers1distances)

Training

Display

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3176

net = newsom([0 02 0 01][5 5]gridtop)

5x5 lattice

2x2 matrix for two-dimensional inputs

dx2 matrix for d-dimensional inputs

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3276

net = train(netXX)

dxN data matrix

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3376

Surface construction

demo_fr2_somm

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3476

Surface reconstruction

P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]

net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)

hold onplotsom(netiw11netlayers1distances)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3576

Self-organization

Maximal fitting and minimal wiring

principle

Sort high dimensional data on a 2D

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3676

Neural optimization

K-means

Minimal distance to the nearest center

Kohonenrsquos self -organizing

Minimal wiring distance between

neighboring centers

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3776

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

C1 i=1j=1

C2 i=1j=M

C3 i=Mj=1

C4 i=Mj=M

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3876

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

S1 i=1j=2M-1

S2 i=2M-1j=M

S3 i=Mj=2M-1

S4 i=2M-1j=1

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3976

If (ij) not within S1-S4 and C1-C4

NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)

NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4076

Wiring Criterion

j M i jih

y yk k NB ji k jih

)1()(

W2

)()( )(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4176

Kohonenrsquos self -organizing algorithm

Self-organizing map - Wikipedia the free

encyclopedia

Neural networks (1996)

Neural networks (2002)

Neurocomputing (2011)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4276

SOM matlab toolbox

SOM Toolbox

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4376

Neural Networks 15(3) 337-347 (2002)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4476

Neurocomputing 74(12-13) 2228-2240

(2011)

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4576

Minimal wiring and maximal fitting

criteria

Maximal fitting to a center Minimal distance to the nearest center

Wiring Criterion

t i

ii t t 2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W

2

)()(

)(

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4676

Minimal wiring and maximal fitting

criteria

t iii t t

2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W2

)()(

)(

WEF(y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4776

Set β sufficiently small Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4876

Minimize the weight distance

k

k NBh

k hk

t

k

i

k k

k NBh

k h

t

ik k

ySolve

y yt t v

dy

W E d

y yt t v

0)()][(][

0)(

W][][E

)(

)(

2

k

2

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4976

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5076

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5176

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5276

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5376

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5476

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5576

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5676

Microarray expressions of yeast genes

Pat_Brown_Lab_Home_Page

Exploring the Metabolic and Genetic Control of Gene

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 15: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1576

Annealed K-means

1 Input unsupervised high dimensional dataand the number K of clusters

2 Set β to sufficiently small positive numberrandomly initialize K centers near the center

of given data3 Determine overlapping memberships of all

data to K clusters

4 Update the center position of each group

5 Increase β carefully 6 If halting condition holds exit

7 Go to step 3

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1676

Exercise

Draw a while-loop flow chart to illustrate

how the annealed K-means algorithm

partition high-dimensional data to K

clusters

Implement the flow chart using Matlab

codes

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1776

Set β sufficiently small

Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1876

Minimize the weight

distance

t

i

t

i

i

t

i

i

i

t

iii

t v

t t v

t t v

dy

dE

t t v

][

][][

0)][(][

0

][][E

i

2

x

y

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1976

k

k

k

k

k

k

k k

k k

d C

d C

v

d C vd v

)exp(1

1)exp(

1

)exp()exp(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2076

j

j

k

k d

d v

)exp(

)exp()(

d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2176

Overlapping membership

Sufficiently small β

vk equals 1K

Sufficiently large β

vk = 1 if dk = mini di

= 0 otherwise

All vk are determined by the winner-take-all

principle

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2276

Why annealing

Sufficiently large β leads to the K-means

algorithm

Can annealing improve the performance

How to measure the performance for

clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2376

center nearestthe][eachof distancetheupsums

10][][

][][

][][

EE

][][E

i

2

2

2

tot E

t vt if

t t v

t t v

t t v

i

t i

ii

i t

ii

i

i

t

iii

x

yx

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2476

Criterion of K-means

Minimal distance to

the nearest center

t i

ii t t 2][][E(y) yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2576

Cross distance

Refer to Math Programming

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2676

Self-organization

Clustering

Define neighboring relations of clusters

on a 2D lattice

Sorting high-dimensional data on a

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2776

Clustering by SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2876

SOM

Download Matlab Neural Networks

toolbox

Add directory nnet with sub-directories

to matlab path

Apply the SOM method to clustering

analysis of data in data_25mat

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2976

Load data

load data_25

plot(XX(1)XX(2)g)

hold on

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3076

SOM

net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100

Initialization

net = train(netXX)

plotsom(netiw11netlayers1distances)

Training

Display

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3176

net = newsom([0 02 0 01][5 5]gridtop)

5x5 lattice

2x2 matrix for two-dimensional inputs

dx2 matrix for d-dimensional inputs

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3276

net = train(netXX)

dxN data matrix

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3376

Surface construction

demo_fr2_somm

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3476

Surface reconstruction

P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]

net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)

hold onplotsom(netiw11netlayers1distances)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3576

Self-organization

Maximal fitting and minimal wiring

principle

Sort high dimensional data on a 2D

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3676

Neural optimization

K-means

Minimal distance to the nearest center

Kohonenrsquos self -organizing

Minimal wiring distance between

neighboring centers

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3776

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

C1 i=1j=1

C2 i=1j=M

C3 i=Mj=1

C4 i=Mj=M

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3876

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

S1 i=1j=2M-1

S2 i=2M-1j=M

S3 i=Mj=2M-1

S4 i=2M-1j=1

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3976

If (ij) not within S1-S4 and C1-C4

NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)

NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4076

Wiring Criterion

j M i jih

y yk k NB ji k jih

)1()(

W2

)()( )(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4176

Kohonenrsquos self -organizing algorithm

Self-organizing map - Wikipedia the free

encyclopedia

Neural networks (1996)

Neural networks (2002)

Neurocomputing (2011)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4276

SOM matlab toolbox

SOM Toolbox

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4376

Neural Networks 15(3) 337-347 (2002)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4476

Neurocomputing 74(12-13) 2228-2240

(2011)

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4576

Minimal wiring and maximal fitting

criteria

Maximal fitting to a center Minimal distance to the nearest center

Wiring Criterion

t i

ii t t 2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W

2

)()(

)(

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4676

Minimal wiring and maximal fitting

criteria

t iii t t

2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W2

)()(

)(

WEF(y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4776

Set β sufficiently small Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4876

Minimize the weight distance

k

k NBh

k hk

t

k

i

k k

k NBh

k h

t

ik k

ySolve

y yt t v

dy

W E d

y yt t v

0)()][(][

0)(

W][][E

)(

)(

2

k

2

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4976

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5076

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5176

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5276

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5376

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5476

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5576

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5676

Microarray expressions of yeast genes

Pat_Brown_Lab_Home_Page

Exploring the Metabolic and Genetic Control of Gene

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 16: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1676

Exercise

Draw a while-loop flow chart to illustrate

how the annealed K-means algorithm

partition high-dimensional data to K

clusters

Implement the flow chart using Matlab

codes

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1776

Set β sufficiently small

Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1876

Minimize the weight

distance

t

i

t

i

i

t

i

i

i

t

iii

t v

t t v

t t v

dy

dE

t t v

][

][][

0)][(][

0

][][E

i

2

x

y

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1976

k

k

k

k

k

k

k k

k k

d C

d C

v

d C vd v

)exp(1

1)exp(

1

)exp()exp(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2076

j

j

k

k d

d v

)exp(

)exp()(

d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2176

Overlapping membership

Sufficiently small β

vk equals 1K

Sufficiently large β

vk = 1 if dk = mini di

= 0 otherwise

All vk are determined by the winner-take-all

principle

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2276

Why annealing

Sufficiently large β leads to the K-means

algorithm

Can annealing improve the performance

How to measure the performance for

clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2376

center nearestthe][eachof distancetheupsums

10][][

][][

][][

EE

][][E

i

2

2

2

tot E

t vt if

t t v

t t v

t t v

i

t i

ii

i t

ii

i

i

t

iii

x

yx

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2476

Criterion of K-means

Minimal distance to

the nearest center

t i

ii t t 2][][E(y) yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2576

Cross distance

Refer to Math Programming

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2676

Self-organization

Clustering

Define neighboring relations of clusters

on a 2D lattice

Sorting high-dimensional data on a

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2776

Clustering by SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2876

SOM

Download Matlab Neural Networks

toolbox

Add directory nnet with sub-directories

to matlab path

Apply the SOM method to clustering

analysis of data in data_25mat

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2976

Load data

load data_25

plot(XX(1)XX(2)g)

hold on

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3076

SOM

net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100

Initialization

net = train(netXX)

plotsom(netiw11netlayers1distances)

Training

Display

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3176

net = newsom([0 02 0 01][5 5]gridtop)

5x5 lattice

2x2 matrix for two-dimensional inputs

dx2 matrix for d-dimensional inputs

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3276

net = train(netXX)

dxN data matrix

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3376

Surface construction

demo_fr2_somm

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3476

Surface reconstruction

P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]

net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)

hold onplotsom(netiw11netlayers1distances)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3576

Self-organization

Maximal fitting and minimal wiring

principle

Sort high dimensional data on a 2D

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3676

Neural optimization

K-means

Minimal distance to the nearest center

Kohonenrsquos self -organizing

Minimal wiring distance between

neighboring centers

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3776

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

C1 i=1j=1

C2 i=1j=M

C3 i=Mj=1

C4 i=Mj=M

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3876

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

S1 i=1j=2M-1

S2 i=2M-1j=M

S3 i=Mj=2M-1

S4 i=2M-1j=1

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3976

If (ij) not within S1-S4 and C1-C4

NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)

NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4076

Wiring Criterion

j M i jih

y yk k NB ji k jih

)1()(

W2

)()( )(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4176

Kohonenrsquos self -organizing algorithm

Self-organizing map - Wikipedia the free

encyclopedia

Neural networks (1996)

Neural networks (2002)

Neurocomputing (2011)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4276

SOM matlab toolbox

SOM Toolbox

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4376

Neural Networks 15(3) 337-347 (2002)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4476

Neurocomputing 74(12-13) 2228-2240

(2011)

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4576

Minimal wiring and maximal fitting

criteria

Maximal fitting to a center Minimal distance to the nearest center

Wiring Criterion

t i

ii t t 2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W

2

)()(

)(

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4676

Minimal wiring and maximal fitting

criteria

t iii t t

2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W2

)()(

)(

WEF(y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4776

Set β sufficiently small Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4876

Minimize the weight distance

k

k NBh

k hk

t

k

i

k k

k NBh

k h

t

ik k

ySolve

y yt t v

dy

W E d

y yt t v

0)()][(][

0)(

W][][E

)(

)(

2

k

2

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4976

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5076

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5176

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5276

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5376

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5476

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5576

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5676

Microarray expressions of yeast genes

Pat_Brown_Lab_Home_Page

Exploring the Metabolic and Genetic Control of Gene

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 17: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1776

Set β sufficiently small

Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1876

Minimize the weight

distance

t

i

t

i

i

t

i

i

i

t

iii

t v

t t v

t t v

dy

dE

t t v

][

][][

0)][(][

0

][][E

i

2

x

y

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1976

k

k

k

k

k

k

k k

k k

d C

d C

v

d C vd v

)exp(1

1)exp(

1

)exp()exp(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2076

j

j

k

k d

d v

)exp(

)exp()(

d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2176

Overlapping membership

Sufficiently small β

vk equals 1K

Sufficiently large β

vk = 1 if dk = mini di

= 0 otherwise

All vk are determined by the winner-take-all

principle

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2276

Why annealing

Sufficiently large β leads to the K-means

algorithm

Can annealing improve the performance

How to measure the performance for

clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2376

center nearestthe][eachof distancetheupsums

10][][

][][

][][

EE

][][E

i

2

2

2

tot E

t vt if

t t v

t t v

t t v

i

t i

ii

i t

ii

i

i

t

iii

x

yx

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2476

Criterion of K-means

Minimal distance to

the nearest center

t i

ii t t 2][][E(y) yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2576

Cross distance

Refer to Math Programming

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2676

Self-organization

Clustering

Define neighboring relations of clusters

on a 2D lattice

Sorting high-dimensional data on a

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2776

Clustering by SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2876

SOM

Download Matlab Neural Networks

toolbox

Add directory nnet with sub-directories

to matlab path

Apply the SOM method to clustering

analysis of data in data_25mat

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2976

Load data

load data_25

plot(XX(1)XX(2)g)

hold on

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3076

SOM

net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100

Initialization

net = train(netXX)

plotsom(netiw11netlayers1distances)

Training

Display

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3176

net = newsom([0 02 0 01][5 5]gridtop)

5x5 lattice

2x2 matrix for two-dimensional inputs

dx2 matrix for d-dimensional inputs

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3276

net = train(netXX)

dxN data matrix

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3376

Surface construction

demo_fr2_somm

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3476

Surface reconstruction

P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]

net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)

hold onplotsom(netiw11netlayers1distances)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3576

Self-organization

Maximal fitting and minimal wiring

principle

Sort high dimensional data on a 2D

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3676

Neural optimization

K-means

Minimal distance to the nearest center

Kohonenrsquos self -organizing

Minimal wiring distance between

neighboring centers

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3776

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

C1 i=1j=1

C2 i=1j=M

C3 i=Mj=1

C4 i=Mj=M

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3876

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

S1 i=1j=2M-1

S2 i=2M-1j=M

S3 i=Mj=2M-1

S4 i=2M-1j=1

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3976

If (ij) not within S1-S4 and C1-C4

NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)

NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4076

Wiring Criterion

j M i jih

y yk k NB ji k jih

)1()(

W2

)()( )(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4176

Kohonenrsquos self -organizing algorithm

Self-organizing map - Wikipedia the free

encyclopedia

Neural networks (1996)

Neural networks (2002)

Neurocomputing (2011)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4276

SOM matlab toolbox

SOM Toolbox

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4376

Neural Networks 15(3) 337-347 (2002)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4476

Neurocomputing 74(12-13) 2228-2240

(2011)

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4576

Minimal wiring and maximal fitting

criteria

Maximal fitting to a center Minimal distance to the nearest center

Wiring Criterion

t i

ii t t 2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W

2

)()(

)(

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4676

Minimal wiring and maximal fitting

criteria

t iii t t

2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W2

)()(

)(

WEF(y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4776

Set β sufficiently small Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4876

Minimize the weight distance

k

k NBh

k hk

t

k

i

k k

k NBh

k h

t

ik k

ySolve

y yt t v

dy

W E d

y yt t v

0)()][(][

0)(

W][][E

)(

)(

2

k

2

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4976

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5076

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5176

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5276

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5376

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5476

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5576

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5676

Microarray expressions of yeast genes

Pat_Brown_Lab_Home_Page

Exploring the Metabolic and Genetic Control of Gene

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 18: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1876

Minimize the weight

distance

t

i

t

i

i

t

i

i

i

t

iii

t v

t t v

t t v

dy

dE

t t v

][

][][

0)][(][

0

][][E

i

2

x

y

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1976

k

k

k

k

k

k

k k

k k

d C

d C

v

d C vd v

)exp(1

1)exp(

1

)exp()exp(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2076

j

j

k

k d

d v

)exp(

)exp()(

d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2176

Overlapping membership

Sufficiently small β

vk equals 1K

Sufficiently large β

vk = 1 if dk = mini di

= 0 otherwise

All vk are determined by the winner-take-all

principle

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2276

Why annealing

Sufficiently large β leads to the K-means

algorithm

Can annealing improve the performance

How to measure the performance for

clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2376

center nearestthe][eachof distancetheupsums

10][][

][][

][][

EE

][][E

i

2

2

2

tot E

t vt if

t t v

t t v

t t v

i

t i

ii

i t

ii

i

i

t

iii

x

yx

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2476

Criterion of K-means

Minimal distance to

the nearest center

t i

ii t t 2][][E(y) yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2576

Cross distance

Refer to Math Programming

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2676

Self-organization

Clustering

Define neighboring relations of clusters

on a 2D lattice

Sorting high-dimensional data on a

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2776

Clustering by SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2876

SOM

Download Matlab Neural Networks

toolbox

Add directory nnet with sub-directories

to matlab path

Apply the SOM method to clustering

analysis of data in data_25mat

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2976

Load data

load data_25

plot(XX(1)XX(2)g)

hold on

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3076

SOM

net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100

Initialization

net = train(netXX)

plotsom(netiw11netlayers1distances)

Training

Display

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3176

net = newsom([0 02 0 01][5 5]gridtop)

5x5 lattice

2x2 matrix for two-dimensional inputs

dx2 matrix for d-dimensional inputs

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3276

net = train(netXX)

dxN data matrix

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3376

Surface construction

demo_fr2_somm

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3476

Surface reconstruction

P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]

net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)

hold onplotsom(netiw11netlayers1distances)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3576

Self-organization

Maximal fitting and minimal wiring

principle

Sort high dimensional data on a 2D

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3676

Neural optimization

K-means

Minimal distance to the nearest center

Kohonenrsquos self -organizing

Minimal wiring distance between

neighboring centers

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3776

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

C1 i=1j=1

C2 i=1j=M

C3 i=Mj=1

C4 i=Mj=M

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3876

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

S1 i=1j=2M-1

S2 i=2M-1j=M

S3 i=Mj=2M-1

S4 i=2M-1j=1

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3976

If (ij) not within S1-S4 and C1-C4

NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)

NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4076

Wiring Criterion

j M i jih

y yk k NB ji k jih

)1()(

W2

)()( )(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4176

Kohonenrsquos self -organizing algorithm

Self-organizing map - Wikipedia the free

encyclopedia

Neural networks (1996)

Neural networks (2002)

Neurocomputing (2011)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4276

SOM matlab toolbox

SOM Toolbox

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4376

Neural Networks 15(3) 337-347 (2002)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4476

Neurocomputing 74(12-13) 2228-2240

(2011)

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4576

Minimal wiring and maximal fitting

criteria

Maximal fitting to a center Minimal distance to the nearest center

Wiring Criterion

t i

ii t t 2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W

2

)()(

)(

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4676

Minimal wiring and maximal fitting

criteria

t iii t t

2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W2

)()(

)(

WEF(y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4776

Set β sufficiently small Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4876

Minimize the weight distance

k

k NBh

k hk

t

k

i

k k

k NBh

k h

t

ik k

ySolve

y yt t v

dy

W E d

y yt t v

0)()][(][

0)(

W][][E

)(

)(

2

k

2

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4976

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5076

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5176

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5276

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5376

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5476

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5576

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5676

Microarray expressions of yeast genes

Pat_Brown_Lab_Home_Page

Exploring the Metabolic and Genetic Control of Gene

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 19: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 1976

k

k

k

k

k

k

k k

k k

d C

d C

v

d C vd v

)exp(1

1)exp(

1

)exp()exp(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2076

j

j

k

k d

d v

)exp(

)exp()(

d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2176

Overlapping membership

Sufficiently small β

vk equals 1K

Sufficiently large β

vk = 1 if dk = mini di

= 0 otherwise

All vk are determined by the winner-take-all

principle

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2276

Why annealing

Sufficiently large β leads to the K-means

algorithm

Can annealing improve the performance

How to measure the performance for

clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2376

center nearestthe][eachof distancetheupsums

10][][

][][

][][

EE

][][E

i

2

2

2

tot E

t vt if

t t v

t t v

t t v

i

t i

ii

i t

ii

i

i

t

iii

x

yx

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2476

Criterion of K-means

Minimal distance to

the nearest center

t i

ii t t 2][][E(y) yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2576

Cross distance

Refer to Math Programming

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2676

Self-organization

Clustering

Define neighboring relations of clusters

on a 2D lattice

Sorting high-dimensional data on a

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2776

Clustering by SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2876

SOM

Download Matlab Neural Networks

toolbox

Add directory nnet with sub-directories

to matlab path

Apply the SOM method to clustering

analysis of data in data_25mat

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2976

Load data

load data_25

plot(XX(1)XX(2)g)

hold on

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3076

SOM

net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100

Initialization

net = train(netXX)

plotsom(netiw11netlayers1distances)

Training

Display

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3176

net = newsom([0 02 0 01][5 5]gridtop)

5x5 lattice

2x2 matrix for two-dimensional inputs

dx2 matrix for d-dimensional inputs

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3276

net = train(netXX)

dxN data matrix

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3376

Surface construction

demo_fr2_somm

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3476

Surface reconstruction

P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]

net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)

hold onplotsom(netiw11netlayers1distances)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3576

Self-organization

Maximal fitting and minimal wiring

principle

Sort high dimensional data on a 2D

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3676

Neural optimization

K-means

Minimal distance to the nearest center

Kohonenrsquos self -organizing

Minimal wiring distance between

neighboring centers

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3776

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

C1 i=1j=1

C2 i=1j=M

C3 i=Mj=1

C4 i=Mj=M

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3876

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

S1 i=1j=2M-1

S2 i=2M-1j=M

S3 i=Mj=2M-1

S4 i=2M-1j=1

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3976

If (ij) not within S1-S4 and C1-C4

NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)

NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4076

Wiring Criterion

j M i jih

y yk k NB ji k jih

)1()(

W2

)()( )(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4176

Kohonenrsquos self -organizing algorithm

Self-organizing map - Wikipedia the free

encyclopedia

Neural networks (1996)

Neural networks (2002)

Neurocomputing (2011)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4276

SOM matlab toolbox

SOM Toolbox

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4376

Neural Networks 15(3) 337-347 (2002)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4476

Neurocomputing 74(12-13) 2228-2240

(2011)

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4576

Minimal wiring and maximal fitting

criteria

Maximal fitting to a center Minimal distance to the nearest center

Wiring Criterion

t i

ii t t 2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W

2

)()(

)(

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4676

Minimal wiring and maximal fitting

criteria

t iii t t

2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W2

)()(

)(

WEF(y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4776

Set β sufficiently small Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4876

Minimize the weight distance

k

k NBh

k hk

t

k

i

k k

k NBh

k h

t

ik k

ySolve

y yt t v

dy

W E d

y yt t v

0)()][(][

0)(

W][][E

)(

)(

2

k

2

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4976

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5076

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5176

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5276

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5376

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5476

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5576

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5676

Microarray expressions of yeast genes

Pat_Brown_Lab_Home_Page

Exploring the Metabolic and Genetic Control of Gene

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 20: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2076

j

j

k

k d

d v

)exp(

)exp()(

d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2176

Overlapping membership

Sufficiently small β

vk equals 1K

Sufficiently large β

vk = 1 if dk = mini di

= 0 otherwise

All vk are determined by the winner-take-all

principle

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2276

Why annealing

Sufficiently large β leads to the K-means

algorithm

Can annealing improve the performance

How to measure the performance for

clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2376

center nearestthe][eachof distancetheupsums

10][][

][][

][][

EE

][][E

i

2

2

2

tot E

t vt if

t t v

t t v

t t v

i

t i

ii

i t

ii

i

i

t

iii

x

yx

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2476

Criterion of K-means

Minimal distance to

the nearest center

t i

ii t t 2][][E(y) yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2576

Cross distance

Refer to Math Programming

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2676

Self-organization

Clustering

Define neighboring relations of clusters

on a 2D lattice

Sorting high-dimensional data on a

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2776

Clustering by SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2876

SOM

Download Matlab Neural Networks

toolbox

Add directory nnet with sub-directories

to matlab path

Apply the SOM method to clustering

analysis of data in data_25mat

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2976

Load data

load data_25

plot(XX(1)XX(2)g)

hold on

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3076

SOM

net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100

Initialization

net = train(netXX)

plotsom(netiw11netlayers1distances)

Training

Display

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3176

net = newsom([0 02 0 01][5 5]gridtop)

5x5 lattice

2x2 matrix for two-dimensional inputs

dx2 matrix for d-dimensional inputs

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3276

net = train(netXX)

dxN data matrix

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3376

Surface construction

demo_fr2_somm

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3476

Surface reconstruction

P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]

net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)

hold onplotsom(netiw11netlayers1distances)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3576

Self-organization

Maximal fitting and minimal wiring

principle

Sort high dimensional data on a 2D

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3676

Neural optimization

K-means

Minimal distance to the nearest center

Kohonenrsquos self -organizing

Minimal wiring distance between

neighboring centers

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3776

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

C1 i=1j=1

C2 i=1j=M

C3 i=Mj=1

C4 i=Mj=M

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3876

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

S1 i=1j=2M-1

S2 i=2M-1j=M

S3 i=Mj=2M-1

S4 i=2M-1j=1

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3976

If (ij) not within S1-S4 and C1-C4

NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)

NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4076

Wiring Criterion

j M i jih

y yk k NB ji k jih

)1()(

W2

)()( )(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4176

Kohonenrsquos self -organizing algorithm

Self-organizing map - Wikipedia the free

encyclopedia

Neural networks (1996)

Neural networks (2002)

Neurocomputing (2011)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4276

SOM matlab toolbox

SOM Toolbox

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4376

Neural Networks 15(3) 337-347 (2002)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4476

Neurocomputing 74(12-13) 2228-2240

(2011)

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4576

Minimal wiring and maximal fitting

criteria

Maximal fitting to a center Minimal distance to the nearest center

Wiring Criterion

t i

ii t t 2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W

2

)()(

)(

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4676

Minimal wiring and maximal fitting

criteria

t iii t t

2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W2

)()(

)(

WEF(y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4776

Set β sufficiently small Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4876

Minimize the weight distance

k

k NBh

k hk

t

k

i

k k

k NBh

k h

t

ik k

ySolve

y yt t v

dy

W E d

y yt t v

0)()][(][

0)(

W][][E

)(

)(

2

k

2

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4976

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5076

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5176

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5276

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5376

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5476

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5576

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5676

Microarray expressions of yeast genes

Pat_Brown_Lab_Home_Page

Exploring the Metabolic and Genetic Control of Gene

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 21: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2176

Overlapping membership

Sufficiently small β

vk equals 1K

Sufficiently large β

vk = 1 if dk = mini di

= 0 otherwise

All vk are determined by the winner-take-all

principle

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2276

Why annealing

Sufficiently large β leads to the K-means

algorithm

Can annealing improve the performance

How to measure the performance for

clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2376

center nearestthe][eachof distancetheupsums

10][][

][][

][][

EE

][][E

i

2

2

2

tot E

t vt if

t t v

t t v

t t v

i

t i

ii

i t

ii

i

i

t

iii

x

yx

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2476

Criterion of K-means

Minimal distance to

the nearest center

t i

ii t t 2][][E(y) yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2576

Cross distance

Refer to Math Programming

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2676

Self-organization

Clustering

Define neighboring relations of clusters

on a 2D lattice

Sorting high-dimensional data on a

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2776

Clustering by SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2876

SOM

Download Matlab Neural Networks

toolbox

Add directory nnet with sub-directories

to matlab path

Apply the SOM method to clustering

analysis of data in data_25mat

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2976

Load data

load data_25

plot(XX(1)XX(2)g)

hold on

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3076

SOM

net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100

Initialization

net = train(netXX)

plotsom(netiw11netlayers1distances)

Training

Display

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3176

net = newsom([0 02 0 01][5 5]gridtop)

5x5 lattice

2x2 matrix for two-dimensional inputs

dx2 matrix for d-dimensional inputs

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3276

net = train(netXX)

dxN data matrix

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3376

Surface construction

demo_fr2_somm

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3476

Surface reconstruction

P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]

net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)

hold onplotsom(netiw11netlayers1distances)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3576

Self-organization

Maximal fitting and minimal wiring

principle

Sort high dimensional data on a 2D

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3676

Neural optimization

K-means

Minimal distance to the nearest center

Kohonenrsquos self -organizing

Minimal wiring distance between

neighboring centers

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3776

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

C1 i=1j=1

C2 i=1j=M

C3 i=Mj=1

C4 i=Mj=M

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3876

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

S1 i=1j=2M-1

S2 i=2M-1j=M

S3 i=Mj=2M-1

S4 i=2M-1j=1

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3976

If (ij) not within S1-S4 and C1-C4

NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)

NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4076

Wiring Criterion

j M i jih

y yk k NB ji k jih

)1()(

W2

)()( )(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4176

Kohonenrsquos self -organizing algorithm

Self-organizing map - Wikipedia the free

encyclopedia

Neural networks (1996)

Neural networks (2002)

Neurocomputing (2011)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4276

SOM matlab toolbox

SOM Toolbox

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4376

Neural Networks 15(3) 337-347 (2002)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4476

Neurocomputing 74(12-13) 2228-2240

(2011)

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4576

Minimal wiring and maximal fitting

criteria

Maximal fitting to a center Minimal distance to the nearest center

Wiring Criterion

t i

ii t t 2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W

2

)()(

)(

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4676

Minimal wiring and maximal fitting

criteria

t iii t t

2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W2

)()(

)(

WEF(y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4776

Set β sufficiently small Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4876

Minimize the weight distance

k

k NBh

k hk

t

k

i

k k

k NBh

k h

t

ik k

ySolve

y yt t v

dy

W E d

y yt t v

0)()][(][

0)(

W][][E

)(

)(

2

k

2

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4976

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5076

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5176

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5276

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5376

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5476

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5576

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5676

Microarray expressions of yeast genes

Pat_Brown_Lab_Home_Page

Exploring the Metabolic and Genetic Control of Gene

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 22: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2276

Why annealing

Sufficiently large β leads to the K-means

algorithm

Can annealing improve the performance

How to measure the performance for

clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2376

center nearestthe][eachof distancetheupsums

10][][

][][

][][

EE

][][E

i

2

2

2

tot E

t vt if

t t v

t t v

t t v

i

t i

ii

i t

ii

i

i

t

iii

x

yx

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2476

Criterion of K-means

Minimal distance to

the nearest center

t i

ii t t 2][][E(y) yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2576

Cross distance

Refer to Math Programming

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2676

Self-organization

Clustering

Define neighboring relations of clusters

on a 2D lattice

Sorting high-dimensional data on a

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2776

Clustering by SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2876

SOM

Download Matlab Neural Networks

toolbox

Add directory nnet with sub-directories

to matlab path

Apply the SOM method to clustering

analysis of data in data_25mat

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2976

Load data

load data_25

plot(XX(1)XX(2)g)

hold on

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3076

SOM

net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100

Initialization

net = train(netXX)

plotsom(netiw11netlayers1distances)

Training

Display

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3176

net = newsom([0 02 0 01][5 5]gridtop)

5x5 lattice

2x2 matrix for two-dimensional inputs

dx2 matrix for d-dimensional inputs

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3276

net = train(netXX)

dxN data matrix

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3376

Surface construction

demo_fr2_somm

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3476

Surface reconstruction

P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]

net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)

hold onplotsom(netiw11netlayers1distances)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3576

Self-organization

Maximal fitting and minimal wiring

principle

Sort high dimensional data on a 2D

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3676

Neural optimization

K-means

Minimal distance to the nearest center

Kohonenrsquos self -organizing

Minimal wiring distance between

neighboring centers

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3776

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

C1 i=1j=1

C2 i=1j=M

C3 i=Mj=1

C4 i=Mj=M

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3876

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

S1 i=1j=2M-1

S2 i=2M-1j=M

S3 i=Mj=2M-1

S4 i=2M-1j=1

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3976

If (ij) not within S1-S4 and C1-C4

NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)

NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4076

Wiring Criterion

j M i jih

y yk k NB ji k jih

)1()(

W2

)()( )(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4176

Kohonenrsquos self -organizing algorithm

Self-organizing map - Wikipedia the free

encyclopedia

Neural networks (1996)

Neural networks (2002)

Neurocomputing (2011)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4276

SOM matlab toolbox

SOM Toolbox

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4376

Neural Networks 15(3) 337-347 (2002)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4476

Neurocomputing 74(12-13) 2228-2240

(2011)

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4576

Minimal wiring and maximal fitting

criteria

Maximal fitting to a center Minimal distance to the nearest center

Wiring Criterion

t i

ii t t 2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W

2

)()(

)(

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4676

Minimal wiring and maximal fitting

criteria

t iii t t

2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W2

)()(

)(

WEF(y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4776

Set β sufficiently small Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4876

Minimize the weight distance

k

k NBh

k hk

t

k

i

k k

k NBh

k h

t

ik k

ySolve

y yt t v

dy

W E d

y yt t v

0)()][(][

0)(

W][][E

)(

)(

2

k

2

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4976

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5076

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5176

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5276

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5376

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5476

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5576

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5676

Microarray expressions of yeast genes

Pat_Brown_Lab_Home_Page

Exploring the Metabolic and Genetic Control of Gene

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 23: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2376

center nearestthe][eachof distancetheupsums

10][][

][][

][][

EE

][][E

i

2

2

2

tot E

t vt if

t t v

t t v

t t v

i

t i

ii

i t

ii

i

i

t

iii

x

yx

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2476

Criterion of K-means

Minimal distance to

the nearest center

t i

ii t t 2][][E(y) yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2576

Cross distance

Refer to Math Programming

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2676

Self-organization

Clustering

Define neighboring relations of clusters

on a 2D lattice

Sorting high-dimensional data on a

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2776

Clustering by SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2876

SOM

Download Matlab Neural Networks

toolbox

Add directory nnet with sub-directories

to matlab path

Apply the SOM method to clustering

analysis of data in data_25mat

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2976

Load data

load data_25

plot(XX(1)XX(2)g)

hold on

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3076

SOM

net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100

Initialization

net = train(netXX)

plotsom(netiw11netlayers1distances)

Training

Display

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3176

net = newsom([0 02 0 01][5 5]gridtop)

5x5 lattice

2x2 matrix for two-dimensional inputs

dx2 matrix for d-dimensional inputs

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3276

net = train(netXX)

dxN data matrix

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3376

Surface construction

demo_fr2_somm

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3476

Surface reconstruction

P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]

net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)

hold onplotsom(netiw11netlayers1distances)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3576

Self-organization

Maximal fitting and minimal wiring

principle

Sort high dimensional data on a 2D

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3676

Neural optimization

K-means

Minimal distance to the nearest center

Kohonenrsquos self -organizing

Minimal wiring distance between

neighboring centers

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3776

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

C1 i=1j=1

C2 i=1j=M

C3 i=Mj=1

C4 i=Mj=M

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3876

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

S1 i=1j=2M-1

S2 i=2M-1j=M

S3 i=Mj=2M-1

S4 i=2M-1j=1

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3976

If (ij) not within S1-S4 and C1-C4

NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)

NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4076

Wiring Criterion

j M i jih

y yk k NB ji k jih

)1()(

W2

)()( )(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4176

Kohonenrsquos self -organizing algorithm

Self-organizing map - Wikipedia the free

encyclopedia

Neural networks (1996)

Neural networks (2002)

Neurocomputing (2011)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4276

SOM matlab toolbox

SOM Toolbox

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4376

Neural Networks 15(3) 337-347 (2002)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4476

Neurocomputing 74(12-13) 2228-2240

(2011)

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4576

Minimal wiring and maximal fitting

criteria

Maximal fitting to a center Minimal distance to the nearest center

Wiring Criterion

t i

ii t t 2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W

2

)()(

)(

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4676

Minimal wiring and maximal fitting

criteria

t iii t t

2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W2

)()(

)(

WEF(y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4776

Set β sufficiently small Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4876

Minimize the weight distance

k

k NBh

k hk

t

k

i

k k

k NBh

k h

t

ik k

ySolve

y yt t v

dy

W E d

y yt t v

0)()][(][

0)(

W][][E

)(

)(

2

k

2

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4976

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5076

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5176

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5276

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5376

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5476

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5576

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5676

Microarray expressions of yeast genes

Pat_Brown_Lab_Home_Page

Exploring the Metabolic and Genetic Control of Gene

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 24: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2476

Criterion of K-means

Minimal distance to

the nearest center

t i

ii t t 2][][E(y) yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2576

Cross distance

Refer to Math Programming

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2676

Self-organization

Clustering

Define neighboring relations of clusters

on a 2D lattice

Sorting high-dimensional data on a

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2776

Clustering by SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2876

SOM

Download Matlab Neural Networks

toolbox

Add directory nnet with sub-directories

to matlab path

Apply the SOM method to clustering

analysis of data in data_25mat

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2976

Load data

load data_25

plot(XX(1)XX(2)g)

hold on

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3076

SOM

net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100

Initialization

net = train(netXX)

plotsom(netiw11netlayers1distances)

Training

Display

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3176

net = newsom([0 02 0 01][5 5]gridtop)

5x5 lattice

2x2 matrix for two-dimensional inputs

dx2 matrix for d-dimensional inputs

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3276

net = train(netXX)

dxN data matrix

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3376

Surface construction

demo_fr2_somm

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3476

Surface reconstruction

P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]

net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)

hold onplotsom(netiw11netlayers1distances)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3576

Self-organization

Maximal fitting and minimal wiring

principle

Sort high dimensional data on a 2D

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3676

Neural optimization

K-means

Minimal distance to the nearest center

Kohonenrsquos self -organizing

Minimal wiring distance between

neighboring centers

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3776

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

C1 i=1j=1

C2 i=1j=M

C3 i=Mj=1

C4 i=Mj=M

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3876

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

S1 i=1j=2M-1

S2 i=2M-1j=M

S3 i=Mj=2M-1

S4 i=2M-1j=1

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3976

If (ij) not within S1-S4 and C1-C4

NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)

NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4076

Wiring Criterion

j M i jih

y yk k NB ji k jih

)1()(

W2

)()( )(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4176

Kohonenrsquos self -organizing algorithm

Self-organizing map - Wikipedia the free

encyclopedia

Neural networks (1996)

Neural networks (2002)

Neurocomputing (2011)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4276

SOM matlab toolbox

SOM Toolbox

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4376

Neural Networks 15(3) 337-347 (2002)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4476

Neurocomputing 74(12-13) 2228-2240

(2011)

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4576

Minimal wiring and maximal fitting

criteria

Maximal fitting to a center Minimal distance to the nearest center

Wiring Criterion

t i

ii t t 2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W

2

)()(

)(

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4676

Minimal wiring and maximal fitting

criteria

t iii t t

2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W2

)()(

)(

WEF(y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4776

Set β sufficiently small Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4876

Minimize the weight distance

k

k NBh

k hk

t

k

i

k k

k NBh

k h

t

ik k

ySolve

y yt t v

dy

W E d

y yt t v

0)()][(][

0)(

W][][E

)(

)(

2

k

2

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4976

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5076

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5176

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5276

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5376

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5476

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5576

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5676

Microarray expressions of yeast genes

Pat_Brown_Lab_Home_Page

Exploring the Metabolic and Genetic Control of Gene

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 25: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2576

Cross distance

Refer to Math Programming

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2676

Self-organization

Clustering

Define neighboring relations of clusters

on a 2D lattice

Sorting high-dimensional data on a

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2776

Clustering by SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2876

SOM

Download Matlab Neural Networks

toolbox

Add directory nnet with sub-directories

to matlab path

Apply the SOM method to clustering

analysis of data in data_25mat

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2976

Load data

load data_25

plot(XX(1)XX(2)g)

hold on

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3076

SOM

net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100

Initialization

net = train(netXX)

plotsom(netiw11netlayers1distances)

Training

Display

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3176

net = newsom([0 02 0 01][5 5]gridtop)

5x5 lattice

2x2 matrix for two-dimensional inputs

dx2 matrix for d-dimensional inputs

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3276

net = train(netXX)

dxN data matrix

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3376

Surface construction

demo_fr2_somm

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3476

Surface reconstruction

P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]

net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)

hold onplotsom(netiw11netlayers1distances)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3576

Self-organization

Maximal fitting and minimal wiring

principle

Sort high dimensional data on a 2D

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3676

Neural optimization

K-means

Minimal distance to the nearest center

Kohonenrsquos self -organizing

Minimal wiring distance between

neighboring centers

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3776

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

C1 i=1j=1

C2 i=1j=M

C3 i=Mj=1

C4 i=Mj=M

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3876

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

S1 i=1j=2M-1

S2 i=2M-1j=M

S3 i=Mj=2M-1

S4 i=2M-1j=1

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3976

If (ij) not within S1-S4 and C1-C4

NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)

NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4076

Wiring Criterion

j M i jih

y yk k NB ji k jih

)1()(

W2

)()( )(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4176

Kohonenrsquos self -organizing algorithm

Self-organizing map - Wikipedia the free

encyclopedia

Neural networks (1996)

Neural networks (2002)

Neurocomputing (2011)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4276

SOM matlab toolbox

SOM Toolbox

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4376

Neural Networks 15(3) 337-347 (2002)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4476

Neurocomputing 74(12-13) 2228-2240

(2011)

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4576

Minimal wiring and maximal fitting

criteria

Maximal fitting to a center Minimal distance to the nearest center

Wiring Criterion

t i

ii t t 2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W

2

)()(

)(

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4676

Minimal wiring and maximal fitting

criteria

t iii t t

2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W2

)()(

)(

WEF(y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4776

Set β sufficiently small Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4876

Minimize the weight distance

k

k NBh

k hk

t

k

i

k k

k NBh

k h

t

ik k

ySolve

y yt t v

dy

W E d

y yt t v

0)()][(][

0)(

W][][E

)(

)(

2

k

2

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4976

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5076

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5176

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5276

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5376

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5476

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5576

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5676

Microarray expressions of yeast genes

Pat_Brown_Lab_Home_Page

Exploring the Metabolic and Genetic Control of Gene

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 26: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2676

Self-organization

Clustering

Define neighboring relations of clusters

on a 2D lattice

Sorting high-dimensional data on a

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2776

Clustering by SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2876

SOM

Download Matlab Neural Networks

toolbox

Add directory nnet with sub-directories

to matlab path

Apply the SOM method to clustering

analysis of data in data_25mat

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2976

Load data

load data_25

plot(XX(1)XX(2)g)

hold on

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3076

SOM

net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100

Initialization

net = train(netXX)

plotsom(netiw11netlayers1distances)

Training

Display

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3176

net = newsom([0 02 0 01][5 5]gridtop)

5x5 lattice

2x2 matrix for two-dimensional inputs

dx2 matrix for d-dimensional inputs

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3276

net = train(netXX)

dxN data matrix

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3376

Surface construction

demo_fr2_somm

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3476

Surface reconstruction

P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]

net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)

hold onplotsom(netiw11netlayers1distances)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3576

Self-organization

Maximal fitting and minimal wiring

principle

Sort high dimensional data on a 2D

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3676

Neural optimization

K-means

Minimal distance to the nearest center

Kohonenrsquos self -organizing

Minimal wiring distance between

neighboring centers

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3776

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

C1 i=1j=1

C2 i=1j=M

C3 i=Mj=1

C4 i=Mj=M

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3876

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

S1 i=1j=2M-1

S2 i=2M-1j=M

S3 i=Mj=2M-1

S4 i=2M-1j=1

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3976

If (ij) not within S1-S4 and C1-C4

NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)

NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4076

Wiring Criterion

j M i jih

y yk k NB ji k jih

)1()(

W2

)()( )(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4176

Kohonenrsquos self -organizing algorithm

Self-organizing map - Wikipedia the free

encyclopedia

Neural networks (1996)

Neural networks (2002)

Neurocomputing (2011)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4276

SOM matlab toolbox

SOM Toolbox

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4376

Neural Networks 15(3) 337-347 (2002)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4476

Neurocomputing 74(12-13) 2228-2240

(2011)

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4576

Minimal wiring and maximal fitting

criteria

Maximal fitting to a center Minimal distance to the nearest center

Wiring Criterion

t i

ii t t 2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W

2

)()(

)(

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4676

Minimal wiring and maximal fitting

criteria

t iii t t

2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W2

)()(

)(

WEF(y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4776

Set β sufficiently small Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4876

Minimize the weight distance

k

k NBh

k hk

t

k

i

k k

k NBh

k h

t

ik k

ySolve

y yt t v

dy

W E d

y yt t v

0)()][(][

0)(

W][][E

)(

)(

2

k

2

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4976

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5076

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5176

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5276

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5376

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5476

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5576

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5676

Microarray expressions of yeast genes

Pat_Brown_Lab_Home_Page

Exploring the Metabolic and Genetic Control of Gene

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 27: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2776

Clustering by SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2876

SOM

Download Matlab Neural Networks

toolbox

Add directory nnet with sub-directories

to matlab path

Apply the SOM method to clustering

analysis of data in data_25mat

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2976

Load data

load data_25

plot(XX(1)XX(2)g)

hold on

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3076

SOM

net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100

Initialization

net = train(netXX)

plotsom(netiw11netlayers1distances)

Training

Display

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3176

net = newsom([0 02 0 01][5 5]gridtop)

5x5 lattice

2x2 matrix for two-dimensional inputs

dx2 matrix for d-dimensional inputs

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3276

net = train(netXX)

dxN data matrix

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3376

Surface construction

demo_fr2_somm

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3476

Surface reconstruction

P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]

net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)

hold onplotsom(netiw11netlayers1distances)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3576

Self-organization

Maximal fitting and minimal wiring

principle

Sort high dimensional data on a 2D

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3676

Neural optimization

K-means

Minimal distance to the nearest center

Kohonenrsquos self -organizing

Minimal wiring distance between

neighboring centers

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3776

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

C1 i=1j=1

C2 i=1j=M

C3 i=Mj=1

C4 i=Mj=M

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3876

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

S1 i=1j=2M-1

S2 i=2M-1j=M

S3 i=Mj=2M-1

S4 i=2M-1j=1

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3976

If (ij) not within S1-S4 and C1-C4

NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)

NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4076

Wiring Criterion

j M i jih

y yk k NB ji k jih

)1()(

W2

)()( )(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4176

Kohonenrsquos self -organizing algorithm

Self-organizing map - Wikipedia the free

encyclopedia

Neural networks (1996)

Neural networks (2002)

Neurocomputing (2011)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4276

SOM matlab toolbox

SOM Toolbox

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4376

Neural Networks 15(3) 337-347 (2002)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4476

Neurocomputing 74(12-13) 2228-2240

(2011)

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4576

Minimal wiring and maximal fitting

criteria

Maximal fitting to a center Minimal distance to the nearest center

Wiring Criterion

t i

ii t t 2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W

2

)()(

)(

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4676

Minimal wiring and maximal fitting

criteria

t iii t t

2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W2

)()(

)(

WEF(y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4776

Set β sufficiently small Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4876

Minimize the weight distance

k

k NBh

k hk

t

k

i

k k

k NBh

k h

t

ik k

ySolve

y yt t v

dy

W E d

y yt t v

0)()][(][

0)(

W][][E

)(

)(

2

k

2

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4976

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5076

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5176

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5276

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5376

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5476

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5576

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5676

Microarray expressions of yeast genes

Pat_Brown_Lab_Home_Page

Exploring the Metabolic and Genetic Control of Gene

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 28: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2876

SOM

Download Matlab Neural Networks

toolbox

Add directory nnet with sub-directories

to matlab path

Apply the SOM method to clustering

analysis of data in data_25mat

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2976

Load data

load data_25

plot(XX(1)XX(2)g)

hold on

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3076

SOM

net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100

Initialization

net = train(netXX)

plotsom(netiw11netlayers1distances)

Training

Display

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3176

net = newsom([0 02 0 01][5 5]gridtop)

5x5 lattice

2x2 matrix for two-dimensional inputs

dx2 matrix for d-dimensional inputs

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3276

net = train(netXX)

dxN data matrix

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3376

Surface construction

demo_fr2_somm

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3476

Surface reconstruction

P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]

net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)

hold onplotsom(netiw11netlayers1distances)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3576

Self-organization

Maximal fitting and minimal wiring

principle

Sort high dimensional data on a 2D

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3676

Neural optimization

K-means

Minimal distance to the nearest center

Kohonenrsquos self -organizing

Minimal wiring distance between

neighboring centers

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3776

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

C1 i=1j=1

C2 i=1j=M

C3 i=Mj=1

C4 i=Mj=M

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3876

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

S1 i=1j=2M-1

S2 i=2M-1j=M

S3 i=Mj=2M-1

S4 i=2M-1j=1

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3976

If (ij) not within S1-S4 and C1-C4

NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)

NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4076

Wiring Criterion

j M i jih

y yk k NB ji k jih

)1()(

W2

)()( )(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4176

Kohonenrsquos self -organizing algorithm

Self-organizing map - Wikipedia the free

encyclopedia

Neural networks (1996)

Neural networks (2002)

Neurocomputing (2011)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4276

SOM matlab toolbox

SOM Toolbox

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4376

Neural Networks 15(3) 337-347 (2002)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4476

Neurocomputing 74(12-13) 2228-2240

(2011)

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4576

Minimal wiring and maximal fitting

criteria

Maximal fitting to a center Minimal distance to the nearest center

Wiring Criterion

t i

ii t t 2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W

2

)()(

)(

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4676

Minimal wiring and maximal fitting

criteria

t iii t t

2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W2

)()(

)(

WEF(y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4776

Set β sufficiently small Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4876

Minimize the weight distance

k

k NBh

k hk

t

k

i

k k

k NBh

k h

t

ik k

ySolve

y yt t v

dy

W E d

y yt t v

0)()][(][

0)(

W][][E

)(

)(

2

k

2

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4976

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5076

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5176

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5276

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5376

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5476

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5576

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5676

Microarray expressions of yeast genes

Pat_Brown_Lab_Home_Page

Exploring the Metabolic and Genetic Control of Gene

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 29: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 2976

Load data

load data_25

plot(XX(1)XX(2)g)

hold on

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3076

SOM

net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100

Initialization

net = train(netXX)

plotsom(netiw11netlayers1distances)

Training

Display

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3176

net = newsom([0 02 0 01][5 5]gridtop)

5x5 lattice

2x2 matrix for two-dimensional inputs

dx2 matrix for d-dimensional inputs

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3276

net = train(netXX)

dxN data matrix

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3376

Surface construction

demo_fr2_somm

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3476

Surface reconstruction

P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]

net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)

hold onplotsom(netiw11netlayers1distances)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3576

Self-organization

Maximal fitting and minimal wiring

principle

Sort high dimensional data on a 2D

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3676

Neural optimization

K-means

Minimal distance to the nearest center

Kohonenrsquos self -organizing

Minimal wiring distance between

neighboring centers

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3776

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

C1 i=1j=1

C2 i=1j=M

C3 i=Mj=1

C4 i=Mj=M

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3876

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

S1 i=1j=2M-1

S2 i=2M-1j=M

S3 i=Mj=2M-1

S4 i=2M-1j=1

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3976

If (ij) not within S1-S4 and C1-C4

NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)

NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4076

Wiring Criterion

j M i jih

y yk k NB ji k jih

)1()(

W2

)()( )(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4176

Kohonenrsquos self -organizing algorithm

Self-organizing map - Wikipedia the free

encyclopedia

Neural networks (1996)

Neural networks (2002)

Neurocomputing (2011)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4276

SOM matlab toolbox

SOM Toolbox

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4376

Neural Networks 15(3) 337-347 (2002)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4476

Neurocomputing 74(12-13) 2228-2240

(2011)

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4576

Minimal wiring and maximal fitting

criteria

Maximal fitting to a center Minimal distance to the nearest center

Wiring Criterion

t i

ii t t 2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W

2

)()(

)(

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4676

Minimal wiring and maximal fitting

criteria

t iii t t

2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W2

)()(

)(

WEF(y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4776

Set β sufficiently small Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4876

Minimize the weight distance

k

k NBh

k hk

t

k

i

k k

k NBh

k h

t

ik k

ySolve

y yt t v

dy

W E d

y yt t v

0)()][(][

0)(

W][][E

)(

)(

2

k

2

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4976

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5076

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5176

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5276

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5376

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5476

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5576

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5676

Microarray expressions of yeast genes

Pat_Brown_Lab_Home_Page

Exploring the Metabolic and Genetic Control of Gene

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 30: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3076

SOM

net = newsom([0 02 0 01][5 5]gridtop)nettrainParamepochs=100

Initialization

net = train(netXX)

plotsom(netiw11netlayers1distances)

Training

Display

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3176

net = newsom([0 02 0 01][5 5]gridtop)

5x5 lattice

2x2 matrix for two-dimensional inputs

dx2 matrix for d-dimensional inputs

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3276

net = train(netXX)

dxN data matrix

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3376

Surface construction

demo_fr2_somm

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3476

Surface reconstruction

P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]

net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)

hold onplotsom(netiw11netlayers1distances)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3576

Self-organization

Maximal fitting and minimal wiring

principle

Sort high dimensional data on a 2D

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3676

Neural optimization

K-means

Minimal distance to the nearest center

Kohonenrsquos self -organizing

Minimal wiring distance between

neighboring centers

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3776

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

C1 i=1j=1

C2 i=1j=M

C3 i=Mj=1

C4 i=Mj=M

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3876

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

S1 i=1j=2M-1

S2 i=2M-1j=M

S3 i=Mj=2M-1

S4 i=2M-1j=1

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3976

If (ij) not within S1-S4 and C1-C4

NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)

NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4076

Wiring Criterion

j M i jih

y yk k NB ji k jih

)1()(

W2

)()( )(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4176

Kohonenrsquos self -organizing algorithm

Self-organizing map - Wikipedia the free

encyclopedia

Neural networks (1996)

Neural networks (2002)

Neurocomputing (2011)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4276

SOM matlab toolbox

SOM Toolbox

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4376

Neural Networks 15(3) 337-347 (2002)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4476

Neurocomputing 74(12-13) 2228-2240

(2011)

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4576

Minimal wiring and maximal fitting

criteria

Maximal fitting to a center Minimal distance to the nearest center

Wiring Criterion

t i

ii t t 2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W

2

)()(

)(

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4676

Minimal wiring and maximal fitting

criteria

t iii t t

2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W2

)()(

)(

WEF(y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4776

Set β sufficiently small Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4876

Minimize the weight distance

k

k NBh

k hk

t

k

i

k k

k NBh

k h

t

ik k

ySolve

y yt t v

dy

W E d

y yt t v

0)()][(][

0)(

W][][E

)(

)(

2

k

2

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4976

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5076

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5176

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5276

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5376

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5476

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5576

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5676

Microarray expressions of yeast genes

Pat_Brown_Lab_Home_Page

Exploring the Metabolic and Genetic Control of Gene

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 31: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3176

net = newsom([0 02 0 01][5 5]gridtop)

5x5 lattice

2x2 matrix for two-dimensional inputs

dx2 matrix for d-dimensional inputs

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3276

net = train(netXX)

dxN data matrix

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3376

Surface construction

demo_fr2_somm

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3476

Surface reconstruction

P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]

net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)

hold onplotsom(netiw11netlayers1distances)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3576

Self-organization

Maximal fitting and minimal wiring

principle

Sort high dimensional data on a 2D

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3676

Neural optimization

K-means

Minimal distance to the nearest center

Kohonenrsquos self -organizing

Minimal wiring distance between

neighboring centers

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3776

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

C1 i=1j=1

C2 i=1j=M

C3 i=Mj=1

C4 i=Mj=M

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3876

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

S1 i=1j=2M-1

S2 i=2M-1j=M

S3 i=Mj=2M-1

S4 i=2M-1j=1

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3976

If (ij) not within S1-S4 and C1-C4

NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)

NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4076

Wiring Criterion

j M i jih

y yk k NB ji k jih

)1()(

W2

)()( )(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4176

Kohonenrsquos self -organizing algorithm

Self-organizing map - Wikipedia the free

encyclopedia

Neural networks (1996)

Neural networks (2002)

Neurocomputing (2011)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4276

SOM matlab toolbox

SOM Toolbox

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4376

Neural Networks 15(3) 337-347 (2002)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4476

Neurocomputing 74(12-13) 2228-2240

(2011)

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4576

Minimal wiring and maximal fitting

criteria

Maximal fitting to a center Minimal distance to the nearest center

Wiring Criterion

t i

ii t t 2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W

2

)()(

)(

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4676

Minimal wiring and maximal fitting

criteria

t iii t t

2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W2

)()(

)(

WEF(y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4776

Set β sufficiently small Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4876

Minimize the weight distance

k

k NBh

k hk

t

k

i

k k

k NBh

k h

t

ik k

ySolve

y yt t v

dy

W E d

y yt t v

0)()][(][

0)(

W][][E

)(

)(

2

k

2

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4976

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5076

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5176

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5276

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5376

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5476

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5576

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5676

Microarray expressions of yeast genes

Pat_Brown_Lab_Home_Page

Exploring the Metabolic and Genetic Control of Gene

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 32: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3276

net = train(netXX)

dxN data matrix

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3376

Surface construction

demo_fr2_somm

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3476

Surface reconstruction

P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]

net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)

hold onplotsom(netiw11netlayers1distances)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3576

Self-organization

Maximal fitting and minimal wiring

principle

Sort high dimensional data on a 2D

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3676

Neural optimization

K-means

Minimal distance to the nearest center

Kohonenrsquos self -organizing

Minimal wiring distance between

neighboring centers

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3776

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

C1 i=1j=1

C2 i=1j=M

C3 i=Mj=1

C4 i=Mj=M

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3876

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

S1 i=1j=2M-1

S2 i=2M-1j=M

S3 i=Mj=2M-1

S4 i=2M-1j=1

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3976

If (ij) not within S1-S4 and C1-C4

NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)

NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4076

Wiring Criterion

j M i jih

y yk k NB ji k jih

)1()(

W2

)()( )(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4176

Kohonenrsquos self -organizing algorithm

Self-organizing map - Wikipedia the free

encyclopedia

Neural networks (1996)

Neural networks (2002)

Neurocomputing (2011)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4276

SOM matlab toolbox

SOM Toolbox

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4376

Neural Networks 15(3) 337-347 (2002)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4476

Neurocomputing 74(12-13) 2228-2240

(2011)

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4576

Minimal wiring and maximal fitting

criteria

Maximal fitting to a center Minimal distance to the nearest center

Wiring Criterion

t i

ii t t 2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W

2

)()(

)(

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4676

Minimal wiring and maximal fitting

criteria

t iii t t

2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W2

)()(

)(

WEF(y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4776

Set β sufficiently small Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4876

Minimize the weight distance

k

k NBh

k hk

t

k

i

k k

k NBh

k h

t

ik k

ySolve

y yt t v

dy

W E d

y yt t v

0)()][(][

0)(

W][][E

)(

)(

2

k

2

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4976

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5076

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5176

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5276

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5376

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5476

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5576

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5676

Microarray expressions of yeast genes

Pat_Brown_Lab_Home_Page

Exploring the Metabolic and Genetic Control of Gene

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 33: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3376

Surface construction

demo_fr2_somm

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3476

Surface reconstruction

P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]

net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)

hold onplotsom(netiw11netlayers1distances)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3576

Self-organization

Maximal fitting and minimal wiring

principle

Sort high dimensional data on a 2D

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3676

Neural optimization

K-means

Minimal distance to the nearest center

Kohonenrsquos self -organizing

Minimal wiring distance between

neighboring centers

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3776

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

C1 i=1j=1

C2 i=1j=M

C3 i=Mj=1

C4 i=Mj=M

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3876

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

S1 i=1j=2M-1

S2 i=2M-1j=M

S3 i=Mj=2M-1

S4 i=2M-1j=1

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3976

If (ij) not within S1-S4 and C1-C4

NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)

NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4076

Wiring Criterion

j M i jih

y yk k NB ji k jih

)1()(

W2

)()( )(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4176

Kohonenrsquos self -organizing algorithm

Self-organizing map - Wikipedia the free

encyclopedia

Neural networks (1996)

Neural networks (2002)

Neurocomputing (2011)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4276

SOM matlab toolbox

SOM Toolbox

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4376

Neural Networks 15(3) 337-347 (2002)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4476

Neurocomputing 74(12-13) 2228-2240

(2011)

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4576

Minimal wiring and maximal fitting

criteria

Maximal fitting to a center Minimal distance to the nearest center

Wiring Criterion

t i

ii t t 2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W

2

)()(

)(

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4676

Minimal wiring and maximal fitting

criteria

t iii t t

2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W2

)()(

)(

WEF(y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4776

Set β sufficiently small Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4876

Minimize the weight distance

k

k NBh

k hk

t

k

i

k k

k NBh

k h

t

ik k

ySolve

y yt t v

dy

W E d

y yt t v

0)()][(][

0)(

W][][E

)(

)(

2

k

2

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4976

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5076

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5176

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5276

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5376

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5476

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5576

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5676

Microarray expressions of yeast genes

Pat_Brown_Lab_Home_Page

Exploring the Metabolic and Genetic Control of Gene

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 34: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3476

Surface reconstruction

P = rand(22000)4-2f = exp(-sum(P^2))P =[Pf]

net = newsom([0 02 0 01 0 01][10 10]gridtop)plotsom(netlayers1positions)nettrainParamepochs=100net = train(netP)plot3(P(1)P(2)P(3)g)

hold onplotsom(netiw11netlayers1distances)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3576

Self-organization

Maximal fitting and minimal wiring

principle

Sort high dimensional data on a 2D

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3676

Neural optimization

K-means

Minimal distance to the nearest center

Kohonenrsquos self -organizing

Minimal wiring distance between

neighboring centers

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3776

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

C1 i=1j=1

C2 i=1j=M

C3 i=Mj=1

C4 i=Mj=M

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3876

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

S1 i=1j=2M-1

S2 i=2M-1j=M

S3 i=Mj=2M-1

S4 i=2M-1j=1

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3976

If (ij) not within S1-S4 and C1-C4

NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)

NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4076

Wiring Criterion

j M i jih

y yk k NB ji k jih

)1()(

W2

)()( )(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4176

Kohonenrsquos self -organizing algorithm

Self-organizing map - Wikipedia the free

encyclopedia

Neural networks (1996)

Neural networks (2002)

Neurocomputing (2011)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4276

SOM matlab toolbox

SOM Toolbox

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4376

Neural Networks 15(3) 337-347 (2002)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4476

Neurocomputing 74(12-13) 2228-2240

(2011)

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4576

Minimal wiring and maximal fitting

criteria

Maximal fitting to a center Minimal distance to the nearest center

Wiring Criterion

t i

ii t t 2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W

2

)()(

)(

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4676

Minimal wiring and maximal fitting

criteria

t iii t t

2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W2

)()(

)(

WEF(y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4776

Set β sufficiently small Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4876

Minimize the weight distance

k

k NBh

k hk

t

k

i

k k

k NBh

k h

t

ik k

ySolve

y yt t v

dy

W E d

y yt t v

0)()][(][

0)(

W][][E

)(

)(

2

k

2

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4976

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5076

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5176

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5276

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5376

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5476

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5576

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5676

Microarray expressions of yeast genes

Pat_Brown_Lab_Home_Page

Exploring the Metabolic and Genetic Control of Gene

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 35: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3576

Self-organization

Maximal fitting and minimal wiring

principle

Sort high dimensional data on a 2D

lattice

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3676

Neural optimization

K-means

Minimal distance to the nearest center

Kohonenrsquos self -organizing

Minimal wiring distance between

neighboring centers

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3776

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

C1 i=1j=1

C2 i=1j=M

C3 i=Mj=1

C4 i=Mj=M

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3876

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

S1 i=1j=2M-1

S2 i=2M-1j=M

S3 i=Mj=2M-1

S4 i=2M-1j=1

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3976

If (ij) not within S1-S4 and C1-C4

NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)

NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4076

Wiring Criterion

j M i jih

y yk k NB ji k jih

)1()(

W2

)()( )(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4176

Kohonenrsquos self -organizing algorithm

Self-organizing map - Wikipedia the free

encyclopedia

Neural networks (1996)

Neural networks (2002)

Neurocomputing (2011)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4276

SOM matlab toolbox

SOM Toolbox

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4376

Neural Networks 15(3) 337-347 (2002)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4476

Neurocomputing 74(12-13) 2228-2240

(2011)

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4576

Minimal wiring and maximal fitting

criteria

Maximal fitting to a center Minimal distance to the nearest center

Wiring Criterion

t i

ii t t 2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W

2

)()(

)(

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4676

Minimal wiring and maximal fitting

criteria

t iii t t

2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W2

)()(

)(

WEF(y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4776

Set β sufficiently small Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4876

Minimize the weight distance

k

k NBh

k hk

t

k

i

k k

k NBh

k h

t

ik k

ySolve

y yt t v

dy

W E d

y yt t v

0)()][(][

0)(

W][][E

)(

)(

2

k

2

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4976

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5076

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5176

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5276

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5376

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5476

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5576

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5676

Microarray expressions of yeast genes

Pat_Brown_Lab_Home_Page

Exploring the Metabolic and Genetic Control of Gene

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 36: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3676

Neural optimization

K-means

Minimal distance to the nearest center

Kohonenrsquos self -organizing

Minimal wiring distance between

neighboring centers

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3776

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

C1 i=1j=1

C2 i=1j=M

C3 i=Mj=1

C4 i=Mj=M

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3876

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

S1 i=1j=2M-1

S2 i=2M-1j=M

S3 i=Mj=2M-1

S4 i=2M-1j=1

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3976

If (ij) not within S1-S4 and C1-C4

NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)

NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4076

Wiring Criterion

j M i jih

y yk k NB ji k jih

)1()(

W2

)()( )(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4176

Kohonenrsquos self -organizing algorithm

Self-organizing map - Wikipedia the free

encyclopedia

Neural networks (1996)

Neural networks (2002)

Neurocomputing (2011)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4276

SOM matlab toolbox

SOM Toolbox

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4376

Neural Networks 15(3) 337-347 (2002)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4476

Neurocomputing 74(12-13) 2228-2240

(2011)

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4576

Minimal wiring and maximal fitting

criteria

Maximal fitting to a center Minimal distance to the nearest center

Wiring Criterion

t i

ii t t 2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W

2

)()(

)(

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4676

Minimal wiring and maximal fitting

criteria

t iii t t

2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W2

)()(

)(

WEF(y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4776

Set β sufficiently small Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4876

Minimize the weight distance

k

k NBh

k hk

t

k

i

k k

k NBh

k h

t

ik k

ySolve

y yt t v

dy

W E d

y yt t v

0)()][(][

0)(

W][][E

)(

)(

2

k

2

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4976

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5076

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5176

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5276

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5376

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5476

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5576

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5676

Microarray expressions of yeast genes

Pat_Brown_Lab_Home_Page

Exploring the Metabolic and Genetic Control of Gene

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 37: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3776

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

C1 i=1j=1

C2 i=1j=M

C3 i=Mj=1

C4 i=Mj=M

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3876

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

S1 i=1j=2M-1

S2 i=2M-1j=M

S3 i=Mj=2M-1

S4 i=2M-1j=1

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3976

If (ij) not within S1-S4 and C1-C4

NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)

NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4076

Wiring Criterion

j M i jih

y yk k NB ji k jih

)1()(

W2

)()( )(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4176

Kohonenrsquos self -organizing algorithm

Self-organizing map - Wikipedia the free

encyclopedia

Neural networks (1996)

Neural networks (2002)

Neurocomputing (2011)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4276

SOM matlab toolbox

SOM Toolbox

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4376

Neural Networks 15(3) 337-347 (2002)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4476

Neurocomputing 74(12-13) 2228-2240

(2011)

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4576

Minimal wiring and maximal fitting

criteria

Maximal fitting to a center Minimal distance to the nearest center

Wiring Criterion

t i

ii t t 2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W

2

)()(

)(

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4676

Minimal wiring and maximal fitting

criteria

t iii t t

2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W2

)()(

)(

WEF(y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4776

Set β sufficiently small Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4876

Minimize the weight distance

k

k NBh

k hk

t

k

i

k k

k NBh

k h

t

ik k

ySolve

y yt t v

dy

W E d

y yt t v

0)()][(][

0)(

W][][E

)(

)(

2

k

2

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4976

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5076

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5176

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5276

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5376

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5476

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5576

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5676

Microarray expressions of yeast genes

Pat_Brown_Lab_Home_Page

Exploring the Metabolic and Genetic Control of Gene

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 38: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3876

Neighboring relations on a lattice

An M-by-M lattice

A node is indexed by (ij) ij=1hellipM

S1 i=1j=2M-1

S2 i=2M-1j=M

S3 i=Mj=2M-1

S4 i=2M-1j=1

Corner 3

Corner 2

Corner 4

Corner 1 Side 1

Side 2

Side 3

Side 4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3976

If (ij) not within S1-S4 and C1-C4

NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)

NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4076

Wiring Criterion

j M i jih

y yk k NB ji k jih

)1()(

W2

)()( )(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4176

Kohonenrsquos self -organizing algorithm

Self-organizing map - Wikipedia the free

encyclopedia

Neural networks (1996)

Neural networks (2002)

Neurocomputing (2011)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4276

SOM matlab toolbox

SOM Toolbox

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4376

Neural Networks 15(3) 337-347 (2002)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4476

Neurocomputing 74(12-13) 2228-2240

(2011)

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4576

Minimal wiring and maximal fitting

criteria

Maximal fitting to a center Minimal distance to the nearest center

Wiring Criterion

t i

ii t t 2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W

2

)()(

)(

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4676

Minimal wiring and maximal fitting

criteria

t iii t t

2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W2

)()(

)(

WEF(y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4776

Set β sufficiently small Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4876

Minimize the weight distance

k

k NBh

k hk

t

k

i

k k

k NBh

k h

t

ik k

ySolve

y yt t v

dy

W E d

y yt t v

0)()][(][

0)(

W][][E

)(

)(

2

k

2

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4976

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5076

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5176

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5276

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5376

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5476

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5576

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5676

Microarray expressions of yeast genes

Pat_Brown_Lab_Home_Page

Exploring the Metabolic and Genetic Control of Gene

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 39: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 3976

If (ij) not within S1-S4 and C1-C4

NB(ij)= (i-1j)(i+1j)(ij-1)(i j+1)

NB(ij) can be separately defined if (ij)belongs S1- S4 and C1-C4

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4076

Wiring Criterion

j M i jih

y yk k NB ji k jih

)1()(

W2

)()( )(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4176

Kohonenrsquos self -organizing algorithm

Self-organizing map - Wikipedia the free

encyclopedia

Neural networks (1996)

Neural networks (2002)

Neurocomputing (2011)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4276

SOM matlab toolbox

SOM Toolbox

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4376

Neural Networks 15(3) 337-347 (2002)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4476

Neurocomputing 74(12-13) 2228-2240

(2011)

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4576

Minimal wiring and maximal fitting

criteria

Maximal fitting to a center Minimal distance to the nearest center

Wiring Criterion

t i

ii t t 2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W

2

)()(

)(

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4676

Minimal wiring and maximal fitting

criteria

t iii t t

2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W2

)()(

)(

WEF(y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4776

Set β sufficiently small Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4876

Minimize the weight distance

k

k NBh

k hk

t

k

i

k k

k NBh

k h

t

ik k

ySolve

y yt t v

dy

W E d

y yt t v

0)()][(][

0)(

W][][E

)(

)(

2

k

2

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4976

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5076

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5176

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5276

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5376

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5476

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5576

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5676

Microarray expressions of yeast genes

Pat_Brown_Lab_Home_Page

Exploring the Metabolic and Genetic Control of Gene

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 40: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4076

Wiring Criterion

j M i jih

y yk k NB ji k jih

)1()(

W2

)()( )(

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4176

Kohonenrsquos self -organizing algorithm

Self-organizing map - Wikipedia the free

encyclopedia

Neural networks (1996)

Neural networks (2002)

Neurocomputing (2011)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4276

SOM matlab toolbox

SOM Toolbox

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4376

Neural Networks 15(3) 337-347 (2002)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4476

Neurocomputing 74(12-13) 2228-2240

(2011)

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4576

Minimal wiring and maximal fitting

criteria

Maximal fitting to a center Minimal distance to the nearest center

Wiring Criterion

t i

ii t t 2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W

2

)()(

)(

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4676

Minimal wiring and maximal fitting

criteria

t iii t t

2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W2

)()(

)(

WEF(y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4776

Set β sufficiently small Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4876

Minimize the weight distance

k

k NBh

k hk

t

k

i

k k

k NBh

k h

t

ik k

ySolve

y yt t v

dy

W E d

y yt t v

0)()][(][

0)(

W][][E

)(

)(

2

k

2

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4976

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5076

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5176

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5276

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5376

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5476

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5576

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5676

Microarray expressions of yeast genes

Pat_Brown_Lab_Home_Page

Exploring the Metabolic and Genetic Control of Gene

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 41: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4176

Kohonenrsquos self -organizing algorithm

Self-organizing map - Wikipedia the free

encyclopedia

Neural networks (1996)

Neural networks (2002)

Neurocomputing (2011)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4276

SOM matlab toolbox

SOM Toolbox

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4376

Neural Networks 15(3) 337-347 (2002)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4476

Neurocomputing 74(12-13) 2228-2240

(2011)

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4576

Minimal wiring and maximal fitting

criteria

Maximal fitting to a center Minimal distance to the nearest center

Wiring Criterion

t i

ii t t 2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W

2

)()(

)(

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4676

Minimal wiring and maximal fitting

criteria

t iii t t

2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W2

)()(

)(

WEF(y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4776

Set β sufficiently small Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4876

Minimize the weight distance

k

k NBh

k hk

t

k

i

k k

k NBh

k h

t

ik k

ySolve

y yt t v

dy

W E d

y yt t v

0)()][(][

0)(

W][][E

)(

)(

2

k

2

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4976

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5076

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5176

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5276

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5376

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5476

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5576

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5676

Microarray expressions of yeast genes

Pat_Brown_Lab_Home_Page

Exploring the Metabolic and Genetic Control of Gene

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 42: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4276

SOM matlab toolbox

SOM Toolbox

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4376

Neural Networks 15(3) 337-347 (2002)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4476

Neurocomputing 74(12-13) 2228-2240

(2011)

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4576

Minimal wiring and maximal fitting

criteria

Maximal fitting to a center Minimal distance to the nearest center

Wiring Criterion

t i

ii t t 2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W

2

)()(

)(

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4676

Minimal wiring and maximal fitting

criteria

t iii t t

2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W2

)()(

)(

WEF(y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4776

Set β sufficiently small Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4876

Minimize the weight distance

k

k NBh

k hk

t

k

i

k k

k NBh

k h

t

ik k

ySolve

y yt t v

dy

W E d

y yt t v

0)()][(][

0)(

W][][E

)(

)(

2

k

2

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4976

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5076

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5176

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5276

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5376

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5476

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5576

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5676

Microarray expressions of yeast genes

Pat_Brown_Lab_Home_Page

Exploring the Metabolic and Genetic Control of Gene

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 43: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4376

Neural Networks 15(3) 337-347 (2002)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4476

Neurocomputing 74(12-13) 2228-2240

(2011)

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4576

Minimal wiring and maximal fitting

criteria

Maximal fitting to a center Minimal distance to the nearest center

Wiring Criterion

t i

ii t t 2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W

2

)()(

)(

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4676

Minimal wiring and maximal fitting

criteria

t iii t t

2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W2

)()(

)(

WEF(y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4776

Set β sufficiently small Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4876

Minimize the weight distance

k

k NBh

k hk

t

k

i

k k

k NBh

k h

t

ik k

ySolve

y yt t v

dy

W E d

y yt t v

0)()][(][

0)(

W][][E

)(

)(

2

k

2

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4976

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5076

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5176

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5276

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5376

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5476

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5576

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5676

Microarray expressions of yeast genes

Pat_Brown_Lab_Home_Page

Exploring the Metabolic and Genetic Control of Gene

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 44: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4476

Neurocomputing 74(12-13) 2228-2240

(2011)

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4576

Minimal wiring and maximal fitting

criteria

Maximal fitting to a center Minimal distance to the nearest center

Wiring Criterion

t i

ii t t 2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W

2

)()(

)(

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4676

Minimal wiring and maximal fitting

criteria

t iii t t

2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W2

)()(

)(

WEF(y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4776

Set β sufficiently small Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4876

Minimize the weight distance

k

k NBh

k hk

t

k

i

k k

k NBh

k h

t

ik k

ySolve

y yt t v

dy

W E d

y yt t v

0)()][(][

0)(

W][][E

)(

)(

2

k

2

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4976

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5076

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5176

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5276

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5376

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5476

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5576

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5676

Microarray expressions of yeast genes

Pat_Brown_Lab_Home_Page

Exploring the Metabolic and Genetic Control of Gene

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 45: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4576

Minimal wiring and maximal fitting

criteria

Maximal fitting to a center Minimal distance to the nearest center

Wiring Criterion

t i

ii t t 2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W

2

)()(

)(

Minimal wiring and maximal fitting

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4676

Minimal wiring and maximal fitting

criteria

t iii t t

2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W2

)()(

)(

WEF(y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4776

Set β sufficiently small Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4876

Minimize the weight distance

k

k NBh

k hk

t

k

i

k k

k NBh

k h

t

ik k

ySolve

y yt t v

dy

W E d

y yt t v

0)()][(][

0)(

W][][E

)(

)(

2

k

2

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4976

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5076

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5176

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5276

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5376

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5476

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5576

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5676

Microarray expressions of yeast genes

Pat_Brown_Lab_Home_Page

Exploring the Metabolic and Genetic Control of Gene

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 46: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4676

Minimal wiring and maximal fitting

criteria

t iii t t

2

][][E(y) yx

j M i jih

y yk k NB ji

k jih

)1()(

W2

)()(

)(

WEF(y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4776

Set β sufficiently small Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4876

Minimize the weight distance

k

k NBh

k hk

t

k

i

k k

k NBh

k h

t

ik k

ySolve

y yt t v

dy

W E d

y yt t v

0)()][(][

0)(

W][][E

)(

)(

2

k

2

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4976

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5076

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5176

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5276

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5376

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5476

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5576

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5676

Microarray expressions of yeast genes

Pat_Brown_Lab_Home_Page

Exploring the Metabolic and Genetic Control of Gene

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 47: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4776

Set β sufficiently small Input all xt Set K

Halting

condition

Determine overlapping membership

of each xt

Update each yk

Increase β carefully

exit

Y

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4876

Minimize the weight distance

k

k NBh

k hk

t

k

i

k k

k NBh

k h

t

ik k

ySolve

y yt t v

dy

W E d

y yt t v

0)()][(][

0)(

W][][E

)(

)(

2

k

2

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4976

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5076

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5176

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5276

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5376

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5476

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5576

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5676

Microarray expressions of yeast genes

Pat_Brown_Lab_Home_Page

Exploring the Metabolic and Genetic Control of Gene

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 48: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4876

Minimize the weight distance

k

k NBh

k hk

t

k

i

k k

k NBh

k h

t

ik k

ySolve

y yt t v

dy

W E d

y yt t v

0)()][(][

0)(

W][][E

)(

)(

2

k

2

yx

yx

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4976

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5076

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5176

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5276

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5376

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5476

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5576

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5676

Microarray expressions of yeast genes

Pat_Brown_Lab_Home_Page

Exploring the Metabolic and Genetic Control of Gene

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 49: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 4976

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5076

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5176

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5276

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5376

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5476

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5576

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5676

Microarray expressions of yeast genes

Pat_Brown_Lab_Home_Page

Exploring the Metabolic and Genetic Control of Gene

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 50: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5076

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5176

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5276

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5376

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5476

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5576

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5676

Microarray expressions of yeast genes

Pat_Brown_Lab_Home_Page

Exploring the Metabolic and Genetic Control of Gene

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 51: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5176

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5276

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5376

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5476

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5576

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5676

Microarray expressions of yeast genes

Pat_Brown_Lab_Home_Page

Exploring the Metabolic and Genetic Control of Gene

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 52: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5276

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5376

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5476

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5576

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5676

Microarray expressions of yeast genes

Pat_Brown_Lab_Home_Page

Exploring the Metabolic and Genetic Control of Gene

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 53: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5376

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5476

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5576

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5676

Microarray expressions of yeast genes

Pat_Brown_Lab_Home_Page

Exploring the Metabolic and Genetic Control of Gene

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 54: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5476

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5576

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5676

Microarray expressions of yeast genes

Pat_Brown_Lab_Home_Page

Exploring the Metabolic and Genetic Control of Gene

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 55: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5576

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5676

Microarray expressions of yeast genes

Pat_Brown_Lab_Home_Page

Exploring the Metabolic and Genetic Control of Gene

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 56: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5676

Microarray expressions of yeast genes

Pat_Brown_Lab_Home_Page

Exploring the Metabolic and Genetic Control of Gene

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 57: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5776

Exploring the Metabolic and Genetic Control of Gene

Expression on a Genomic Scale

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 58: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5876

Load gene matrix

Load yeastdata822

Genes(1020)Display names of genesnumbered between 10 and 20

Gene matri 822 7

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 59: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 5976

Gene matrix 822x7

Gene id

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 60: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6076

Problem

Apply the k-means algorithm to processdata_25

Apply the annealed k-means algorithm to

process data_25 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 61: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6176

Problem

Apply the k-means algorithm to processyeast_822

Apply the annealed k-means algorithm to

process yeast_822 Numerical simulations for comparisons

of the two algorithms

Quantitative evaluations Statistics of mean and variance summarized

over 10 executions

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 62: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6276

Gene clustering by kmeans

Load data

load yeastdata822mat

times=[0 95 115 135 155 175 195]

plot(timesyeastvalues)

G l t i b k

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 63: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6376

Gene clustering by kmeans

Kmeans analysis

for c = 116subplot(44c)plot(timesyeastvalues((cidx == c)))

end

Display

Genes belongingthe second cluster

[cidx ctrs] = kmeans(yeastvalues 16)

genes(cidx==2)

E i f 16 l t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 64: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6476

Expressions of 16 gene clusters

St ti ti l D d

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 65: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6576

Statistical Dependency

load yeastdata822matX=yeastvalues

XX=X

W=jader(XX)

Y=WXX

X=Y

yeast_ICA_kmeansm

P bl Y t ICA K

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 66: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6676

Problem Yeast_ICA_Kmean

Apply Jade_ICA to translate yeast data to

independent components

Is it reasonable to employ Euclidean distance

to measure similarity of genes What is the impact of ICA on gene clustering

Numerical simulations

ICA

Annealed k-means clustering

Describe clusters of genes

P bl

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 67: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6776

Problem

Apply the Kohonenrsquos self -organizing

algorithm to sort yeast_822 on a lattice

Refer to Neurocomputing 74(12-13)

2228-2240 (2011)

Visualize yeast gene expressions at

each time course

G ICA SOM

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 68: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6876

Gene_ICA_SOM

JadeICA

load yeastdata822mat

SOM(20x20)

yeast822_som_20x20matdemo_gene_somm

C Di t

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 69: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 6976

Centers=netiw11

Y=Centers

X=P

M=size(Y1)N=size(X1)

A=sum(X^22)ones(1M)

C=ones(N1)sum(Y^22)

B=XY

D=sqrt(A-2B+C)

Cross Distances

C Di t M t i

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 70: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7076

Cross_Distance Matrix

D 822x400

Cross distances between 822 genes and

400 gene centers

400 clusters

Q

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 71: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7176

Query

How many genes in each cluster

Which cluster a gene belongs

[xx ind]=min(D)

sum(ind==k) how many genes in the kth cluster

ind(i)

Vi li ti f d it

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 72: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7276

Visualization of gene density

[xx ind]=min(D) K=20 M=K^2

for k=1M

num(k)=sum(ind==k) how many genes in the kth cluster

end

a = linspace(-11K)

x = []for i=1K

xx = [ones(201)a(i) a]

x = [x xx]

end

y = num x=x

Fitting (x y)

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 73: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7376

Fitting (xy)

Visualization of gene expressions

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 74: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7476

Visualization of gene expressions

Centers=netiw11

Y=Centers

a = linspace(-11K)

x = []

for i=1K

xx = [ones(201)a(i) a]x = [x xx]

end

time = 1

y = Y( time) x=x

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 75: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7576

Problem

Apply ICA to translate yeast_822 to

independent components

Apply SOM to process independent

components of yeast_822

Visualize yeast gene expressions at

each time course

Problem

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course

Page 76: Gene Clustering

7182019 Gene Clustering

httpslidepdfcomreaderfullgene-clustering 7676

Problem

Apply ICA to translate yeast_822 toindependent components

Apply annealed SOM to process

independent components of yeast_822

Visualize yeast gene expressions at

each time course