gene expression data analysis
TRANSCRIPT
![Page 1: Gene Expression Data Analysis](https://reader034.vdocuments.net/reader034/viewer/2022052307/554a0f99b4c9055c598b4a50/html5/thumbnails/1.jpg)
Analysis of Gene Expression Data
_______________________
Jhoirene B. ClementeAlgorithms and Complexity Lab
University of the Philippines Diliman
![Page 2: Gene Expression Data Analysis](https://reader034.vdocuments.net/reader034/viewer/2022052307/554a0f99b4c9055c598b4a50/html5/thumbnails/2.jpg)
Overview
● Definitions● Clustering of Gene Expression Data● Visualizations of Gene Expression Data
![Page 3: Gene Expression Data Analysis](https://reader034.vdocuments.net/reader034/viewer/2022052307/554a0f99b4c9055c598b4a50/html5/thumbnails/3.jpg)
Definitions
Gene
Basic unit of heredity in a living organism. It is normally a stretch of DNA that codes for a type of protein or for an RNA chain that has a function in the organism.
Gene Expression Data
Expression level of genes in an individual that is measured through Microarray Technology.
![Page 4: Gene Expression Data Analysis](https://reader034.vdocuments.net/reader034/viewer/2022052307/554a0f99b4c9055c598b4a50/html5/thumbnails/4.jpg)
Definitions
![Page 5: Gene Expression Data Analysis](https://reader034.vdocuments.net/reader034/viewer/2022052307/554a0f99b4c9055c598b4a50/html5/thumbnails/5.jpg)
Definitions
![Page 6: Gene Expression Data Analysis](https://reader034.vdocuments.net/reader034/viewer/2022052307/554a0f99b4c9055c598b4a50/html5/thumbnails/6.jpg)
Definitions
Gene Expression Data
Gene Gene Expression
a
b
c
...
n
![Page 7: Gene Expression Data Analysis](https://reader034.vdocuments.net/reader034/viewer/2022052307/554a0f99b4c9055c598b4a50/html5/thumbnails/7.jpg)
Definitions
Gene Expression Data
Gene Gene Expression
a
b
c
...
n
1 Sample
n Samples
![Page 8: Gene Expression Data Analysis](https://reader034.vdocuments.net/reader034/viewer/2022052307/554a0f99b4c9055c598b4a50/html5/thumbnails/8.jpg)
Definitions
Gene Sample 1
Sample 1
..... Sample m
a
b
c
...
n
m Samples
n Samples
(n x m) Data Matrix
![Page 9: Gene Expression Data Analysis](https://reader034.vdocuments.net/reader034/viewer/2022052307/554a0f99b4c9055c598b4a50/html5/thumbnails/9.jpg)
Definitions
Gene Sample 1
Sample 1
..... Sample m
a
b
c
...
n
m Samples
n Samples
(n x m) Data Matrix
![Page 10: Gene Expression Data Analysis](https://reader034.vdocuments.net/reader034/viewer/2022052307/554a0f99b4c9055c598b4a50/html5/thumbnails/10.jpg)
Clustering
Clustering is the unsupervised classification of patterns including observations, data sets and feature vectors into groups called clusters, such that objects in the same cluster are similar to each other while objects in different clusters are dissimilar as possible.
image source:ima.umn.edu
![Page 11: Gene Expression Data Analysis](https://reader034.vdocuments.net/reader034/viewer/2022052307/554a0f99b4c9055c598b4a50/html5/thumbnails/11.jpg)
Clustering
Clustering is the unsupervised classification of patterns including observations, data sets and feature vectors into groups called clusters, such that objects in the same cluster are similar to each other while objects in different clusters are dissimilar as possible.
image source:ima.umn.edu
![Page 12: Gene Expression Data Analysis](https://reader034.vdocuments.net/reader034/viewer/2022052307/554a0f99b4c9055c598b4a50/html5/thumbnails/12.jpg)
Cluster Analysis
Preprocessing● Filtering● Normalization
Clustering
Analysis
![Page 13: Gene Expression Data Analysis](https://reader034.vdocuments.net/reader034/viewer/2022052307/554a0f99b4c9055c598b4a50/html5/thumbnails/13.jpg)
Clustering
Partitional
● K-means Algorithm● X-means Algorithm
Hierarchical
![Page 14: Gene Expression Data Analysis](https://reader034.vdocuments.net/reader034/viewer/2022052307/554a0f99b4c9055c598b4a50/html5/thumbnails/14.jpg)
Clustering
Given the (n x m) data matrix, we can
● Cluster the set of genes● Cluster the set of samples● Cluster the set of genes and samples
simultaneously.
![Page 15: Gene Expression Data Analysis](https://reader034.vdocuments.net/reader034/viewer/2022052307/554a0f99b4c9055c598b4a50/html5/thumbnails/15.jpg)
Data Set
Data set is a time series gene expression data from a synchronized population of yeast.
![Page 16: Gene Expression Data Analysis](https://reader034.vdocuments.net/reader034/viewer/2022052307/554a0f99b4c9055c598b4a50/html5/thumbnails/16.jpg)
Data Set
Data set is a time series gene expression data from a synchronized population of yeast.
![Page 17: Gene Expression Data Analysis](https://reader034.vdocuments.net/reader034/viewer/2022052307/554a0f99b4c9055c598b4a50/html5/thumbnails/17.jpg)
Preprocessing
Filtering● Removed genes not involved in cell cycle
regulation● Removed genes belonging to more than one
group
Normalization● All gene expression values range from -1.0 to
1.0.
![Page 18: Gene Expression Data Analysis](https://reader034.vdocuments.net/reader034/viewer/2022052307/554a0f99b4c9055c598b4a50/html5/thumbnails/18.jpg)
Data Set
Data matrix (384 genes and 17 samples) with 5 classifications.Groupings based from cell cycle phase activation.
![Page 19: Gene Expression Data Analysis](https://reader034.vdocuments.net/reader034/viewer/2022052307/554a0f99b4c9055c598b4a50/html5/thumbnails/19.jpg)
Data Set
Group 1: Resting Phase
![Page 20: Gene Expression Data Analysis](https://reader034.vdocuments.net/reader034/viewer/2022052307/554a0f99b4c9055c598b4a50/html5/thumbnails/20.jpg)
Data Set
Group 2: First Growth Phase
![Page 21: Gene Expression Data Analysis](https://reader034.vdocuments.net/reader034/viewer/2022052307/554a0f99b4c9055c598b4a50/html5/thumbnails/21.jpg)
Data Set
Group 3: Synthesis Phase
![Page 22: Gene Expression Data Analysis](https://reader034.vdocuments.net/reader034/viewer/2022052307/554a0f99b4c9055c598b4a50/html5/thumbnails/22.jpg)
Data Set
Group 4: Second Growth Phase
![Page 23: Gene Expression Data Analysis](https://reader034.vdocuments.net/reader034/viewer/2022052307/554a0f99b4c9055c598b4a50/html5/thumbnails/23.jpg)
Data Set
Group 5: Cell Division
![Page 24: Gene Expression Data Analysis](https://reader034.vdocuments.net/reader034/viewer/2022052307/554a0f99b4c9055c598b4a50/html5/thumbnails/24.jpg)
Clustering of genes
K-means Algorithm
Given n data points in Rd
1. Assign k initial centers of the k clusters
2. Assign all the data points to the nearest cluster
(Euclidean distance, Manhattan distance, etc.)
3. Adjust the k centers
4. Repeat steps 2 and 3 until convergence
![Page 25: Gene Expression Data Analysis](https://reader034.vdocuments.net/reader034/viewer/2022052307/554a0f99b4c9055c598b4a50/html5/thumbnails/25.jpg)
Clustering of genes
K-means Algorithm
Given n data points in Rd
1. Assign k initial centers of the k clusters
2. Assign all the data points to the nearest cluster
(Euclidean distance, Manhattan distance, etc.)
3. Adjust the k centers
4. Repeat steps 2 and 3 until convergencek =5
since we want to approximate the 5 biological classification
![Page 26: Gene Expression Data Analysis](https://reader034.vdocuments.net/reader034/viewer/2022052307/554a0f99b4c9055c598b4a50/html5/thumbnails/26.jpg)
Clustering of genes
Initialization
1. Choose the first k centers that will maximize the
distance between the clusters
2. Sort the distances between all the data points
and then choose the k initial points at constant
intervals from the sorted list
3. Use the first k points in the data set as the first k
centers
![Page 27: Gene Expression Data Analysis](https://reader034.vdocuments.net/reader034/viewer/2022052307/554a0f99b4c9055c598b4a50/html5/thumbnails/27.jpg)
Clustering of genes
Using k-means clustering, with k =5
![Page 28: Gene Expression Data Analysis](https://reader034.vdocuments.net/reader034/viewer/2022052307/554a0f99b4c9055c598b4a50/html5/thumbnails/28.jpg)
Clustering of genes
● Clustering may suggest possible roles for genes with unknown functions
● Clustering the samples or experiments may shed light on new subtypes of diseases.
● Identify which type of treatment is suited for a specific type of cancer.
● Building genetic networks
![Page 29: Gene Expression Data Analysis](https://reader034.vdocuments.net/reader034/viewer/2022052307/554a0f99b4c9055c598b4a50/html5/thumbnails/29.jpg)
visualization
Vector FusionNon-metric Multidimensional Scaling (nMDS)Principal Components Analysis (PCA)
![Page 30: Gene Expression Data Analysis](https://reader034.vdocuments.net/reader034/viewer/2022052307/554a0f99b4c9055c598b4a50/html5/thumbnails/30.jpg)
Vector fusion
Visualization technique that uses the Single point broken line parallel algorithm
![Page 31: Gene Expression Data Analysis](https://reader034.vdocuments.net/reader034/viewer/2022052307/554a0f99b4c9055c598b4a50/html5/thumbnails/31.jpg)
nMDS visualization
Input (Dissimilarity Matrix=|ij|) actual distance● In nMDS, only the rank order of entries is
assumed to contain the significant information.● Thus, the purpose of the non-metric MDS
algorithm is to find a configuration of points whose distances reflect as closely as possible the rank order of the data.
● The transformation is by using a non parametric function f. (monotone regression)
dij= f(d
ij) pseudo-distance
![Page 32: Gene Expression Data Analysis](https://reader034.vdocuments.net/reader034/viewer/2022052307/554a0f99b4c9055c598b4a50/html5/thumbnails/32.jpg)
PCA
![Page 33: Gene Expression Data Analysis](https://reader034.vdocuments.net/reader034/viewer/2022052307/554a0f99b4c9055c598b4a50/html5/thumbnails/33.jpg)
vector fusion visualization
![Page 34: Gene Expression Data Analysis](https://reader034.vdocuments.net/reader034/viewer/2022052307/554a0f99b4c9055c598b4a50/html5/thumbnails/34.jpg)
nmds visualization
![Page 35: Gene Expression Data Analysis](https://reader034.vdocuments.net/reader034/viewer/2022052307/554a0f99b4c9055c598b4a50/html5/thumbnails/35.jpg)
nmds visualization
![Page 36: Gene Expression Data Analysis](https://reader034.vdocuments.net/reader034/viewer/2022052307/554a0f99b4c9055c598b4a50/html5/thumbnails/36.jpg)
nmds visualization
![Page 37: Gene Expression Data Analysis](https://reader034.vdocuments.net/reader034/viewer/2022052307/554a0f99b4c9055c598b4a50/html5/thumbnails/37.jpg)
nmds visualization
![Page 38: Gene Expression Data Analysis](https://reader034.vdocuments.net/reader034/viewer/2022052307/554a0f99b4c9055c598b4a50/html5/thumbnails/38.jpg)
nmds visualization
![Page 39: Gene Expression Data Analysis](https://reader034.vdocuments.net/reader034/viewer/2022052307/554a0f99b4c9055c598b4a50/html5/thumbnails/39.jpg)
nmds visualization
![Page 40: Gene Expression Data Analysis](https://reader034.vdocuments.net/reader034/viewer/2022052307/554a0f99b4c9055c598b4a50/html5/thumbnails/40.jpg)
nmds visualization
![Page 41: Gene Expression Data Analysis](https://reader034.vdocuments.net/reader034/viewer/2022052307/554a0f99b4c9055c598b4a50/html5/thumbnails/41.jpg)
References2010: "Non-Metric Multidimensional Scaling and Vector Fusion Visualization of Cell Cycle Independent Gene Expressions for Gene Function Analysis", Clemente J., Salido J.A., (2010), Published in the conference proceedings of National Conference on Information Technology for Education(NCITE) 2010 and Philippine IT Journal Feb 2011 Issue.
2010: "Cluster Analysis for Identifying Genes Highly Correlated with a Phenotype", Clemente J., Undergraduate thesis, Department of Computer Science, University of the Philippines Diliman
![Page 42: Gene Expression Data Analysis](https://reader034.vdocuments.net/reader034/viewer/2022052307/554a0f99b4c9055c598b4a50/html5/thumbnails/42.jpg)
Thank you for Listening