lloyd algorithm k-means clustering. gene expression susumu ohno: whole genome duplications the...
DESCRIPTION
Grouping Grouping genes by derivative. Data must be clustered by derivative.TRANSCRIPT
![Page 1: Lloyd Algorithm K-Means Clustering. Gene Expression Susumu Ohno: whole genome duplications The expression of genes can be measured over time. Identifying](https://reader036.vdocuments.net/reader036/viewer/2022082621/5a4d1b507f8b9ab0599a74cd/html5/thumbnails/1.jpg)
Lloyd Algorithm K-Means Clustering
![Page 2: Lloyd Algorithm K-Means Clustering. Gene Expression Susumu Ohno: whole genome duplications The expression of genes can be measured over time. Identifying](https://reader036.vdocuments.net/reader036/viewer/2022082621/5a4d1b507f8b9ab0599a74cd/html5/thumbnails/2.jpg)
Gene Expression
• Susumu Ohno: whole genome duplications
• The expression of genes can be measured over time.
• Identifying which genes are expressed at a given moment can help determine function.
![Page 3: Lloyd Algorithm K-Means Clustering. Gene Expression Susumu Ohno: whole genome duplications The expression of genes can be measured over time. Identifying](https://reader036.vdocuments.net/reader036/viewer/2022082621/5a4d1b507f8b9ab0599a74cd/html5/thumbnails/3.jpg)
Grouping
• Grouping genes by derivative.• Data must be clustered by derivative.
![Page 4: Lloyd Algorithm K-Means Clustering. Gene Expression Susumu Ohno: whole genome duplications The expression of genes can be measured over time. Identifying](https://reader036.vdocuments.net/reader036/viewer/2022082621/5a4d1b507f8b9ab0599a74cd/html5/thumbnails/4.jpg)
Clustering Problems
• Cluster d data points into k clusters, such that each point is closer to the points in its cluster than those of any other.
• Data is usually not that clearly organized.
![Page 5: Lloyd Algorithm K-Means Clustering. Gene Expression Susumu Ohno: whole genome duplications The expression of genes can be measured over time. Identifying](https://reader036.vdocuments.net/reader036/viewer/2022082621/5a4d1b507f8b9ab0599a74cd/html5/thumbnails/5.jpg)
Lloyd’s Algorithm
• Assign points to clusters, minimizing distance between points and centers of clusters.
• Assign cluster center of gravity as new center, repeat until centers do not change, minimize squared error distortion.
![Page 6: Lloyd Algorithm K-Means Clustering. Gene Expression Susumu Ohno: whole genome duplications The expression of genes can be measured over time. Identifying](https://reader036.vdocuments.net/reader036/viewer/2022082621/5a4d1b507f8b9ab0599a74cd/html5/thumbnails/6.jpg)
The Computational Problem
• Input: A matrix of points with dimensions m and the desired number of clusters k.
• Output: Points organized into k clusters, minimizing distance from center, and a visual representation of the data.
![Page 7: Lloyd Algorithm K-Means Clustering. Gene Expression Susumu Ohno: whole genome duplications The expression of genes can be measured over time. Identifying](https://reader036.vdocuments.net/reader036/viewer/2022082621/5a4d1b507f8b9ab0599a74cd/html5/thumbnails/7.jpg)
Pseudo-pseudocode
• Arbitrarily assign k centers.• Assign points to k clusters, minimizing
Euclidian distance from center.• Assign cluster center of gravity as new center.• Repeat until algorithm converges
![Page 8: Lloyd Algorithm K-Means Clustering. Gene Expression Susumu Ohno: whole genome duplications The expression of genes can be measured over time. Identifying](https://reader036.vdocuments.net/reader036/viewer/2022082621/5a4d1b507f8b9ab0599a74cd/html5/thumbnails/8.jpg)
Plotting
![Page 9: Lloyd Algorithm K-Means Clustering. Gene Expression Susumu Ohno: whole genome duplications The expression of genes can be measured over time. Identifying](https://reader036.vdocuments.net/reader036/viewer/2022082621/5a4d1b507f8b9ab0599a74cd/html5/thumbnails/9.jpg)
Plotting