lloyd algorithm k-means clustering. gene expression susumu ohno: whole genome duplications the...

9
Lloyd Algorithm K-Means Clustering

Upload: noah-norris

Post on 18-Jan-2018

215 views

Category:

Documents


0 download

DESCRIPTION

Grouping Grouping genes by derivative. Data must be clustered by derivative.

TRANSCRIPT

Page 1: Lloyd Algorithm K-Means Clustering. Gene Expression Susumu Ohno: whole genome duplications The expression of genes can be measured over time. Identifying

Lloyd Algorithm K-Means Clustering

Page 2: Lloyd Algorithm K-Means Clustering. Gene Expression Susumu Ohno: whole genome duplications The expression of genes can be measured over time. Identifying

Gene Expression

• Susumu Ohno: whole genome duplications

• The expression of genes can be measured over time.

• Identifying which genes are expressed at a given moment can help determine function.

Page 3: Lloyd Algorithm K-Means Clustering. Gene Expression Susumu Ohno: whole genome duplications The expression of genes can be measured over time. Identifying

Grouping

• Grouping genes by derivative.• Data must be clustered by derivative.

Page 4: Lloyd Algorithm K-Means Clustering. Gene Expression Susumu Ohno: whole genome duplications The expression of genes can be measured over time. Identifying

Clustering Problems

• Cluster d data points into k clusters, such that each point is closer to the points in its cluster than those of any other.

• Data is usually not that clearly organized.

Page 5: Lloyd Algorithm K-Means Clustering. Gene Expression Susumu Ohno: whole genome duplications The expression of genes can be measured over time. Identifying

Lloyd’s Algorithm

• Assign points to clusters, minimizing distance between points and centers of clusters.

• Assign cluster center of gravity as new center, repeat until centers do not change, minimize squared error distortion.

Page 6: Lloyd Algorithm K-Means Clustering. Gene Expression Susumu Ohno: whole genome duplications The expression of genes can be measured over time. Identifying

The Computational Problem

• Input: A matrix of points with dimensions m and the desired number of clusters k.

• Output: Points organized into k clusters, minimizing distance from center, and a visual representation of the data.

Page 7: Lloyd Algorithm K-Means Clustering. Gene Expression Susumu Ohno: whole genome duplications The expression of genes can be measured over time. Identifying

Pseudo-pseudocode

• Arbitrarily assign k centers.• Assign points to k clusters, minimizing

Euclidian distance from center.• Assign cluster center of gravity as new center.• Repeat until algorithm converges

Page 8: Lloyd Algorithm K-Means Clustering. Gene Expression Susumu Ohno: whole genome duplications The expression of genes can be measured over time. Identifying

Plotting

Page 9: Lloyd Algorithm K-Means Clustering. Gene Expression Susumu Ohno: whole genome duplications The expression of genes can be measured over time. Identifying

Plotting