molecular biology - uspjb/lectures/bioinformatics/gene-amsterdam.… · molecular biology: from...
TRANSCRIPT
![Page 1: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/1.jpg)
Molecular Biology:
from sequence analysis to signal processing
Junior Barrera
University of Sao Paulo
![Page 2: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/2.jpg)
Layout
• Introduction
• Knowledge evolution in Genetics
• Data acquisition
• Data Analysis
• A system for genetic data analysis
• Applications
![Page 3: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/3.jpg)
Introduction
• Some medical signals are: EEG, ECG, ultra
sound, tomography, etc.
• These signals are a great source of
information about the human body
• For fully exploration of these data, Digital
Signal Processing Techniques are necessary
• DSP: Algebra + Statistics + Computation
![Page 4: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/4.jpg)
Introduction
• Techniques for the identification of genetic
code are well known
• Soon the code of all genes will be known
• This knowledge open the way for one of the
greatest challenges of science: the
understanding of genes functionality
![Page 5: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/5.jpg)
Introduction
• Sets of genes constitute dynamical systems
that control sequences of Biochemical
reactions, called pathway
• The pathways in a cell define its activities
• States of large sets of genes can be observed
by the microarray technology
• Gene states observed in time are digital
signals
![Page 6: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/6.jpg)
Introduction
• Gene states may describe properties of
tissues. Pattern Recognition techniques
permits to predict tissue properties.
Example: cancer classification
• System Identification techniques permit to
estimate net architectures and dynamics.
Example: control of cell division
![Page 7: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/7.jpg)
Knowledge evolution in genetics
• Heredity - Mendel (1866)
• The phenotypes of an individual depends on
genes of his parents.
![Page 8: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/8.jpg)
Knowledge evolution in genetics
• Chromosome Theory - Morgan (1910)
• Genes were situated in chromosomes
![Page 9: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/9.jpg)
Introduction
• The molecular structure of chromosomes
(Watson and Crick - 1953)
• DNA structure: the double helix
• Four basis: adenine(A), guanine(G),
thymine(T), cytosine(C)
• genes are sequences of nucleotides
![Page 10: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/10.jpg)
Introduction
![Page 11: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/11.jpg)
Introduction
• DNA manipulation
• cut, replication and decoding
![Page 12: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/12.jpg)
Introduction
• Genetic engineering
• species modification, drug production
![Page 13: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/13.jpg)
Introduction
• Genes control the metabolism
• Metabolism occurs by sequences of
enzyme-catalyzed reactions.
• Enzymes are specified by one or more
genes
![Page 14: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/14.jpg)
Introduction
![Page 15: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/15.jpg)
Introduction
• Gene expression
![Page 16: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/16.jpg)
Data acquisition
![Page 17: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/17.jpg)
Data acquisition
![Page 18: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/18.jpg)
Data acquisition
Quantization - {-1,0,1}
![Page 19: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/19.jpg)
Data Analysis
• Data classes definition
• Relational search
• Data transformation
• Mining
• Integration of information
• Interpretations
![Page 20: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/20.jpg)
Data classes definition
molecule
cell
tissue
organism
Protein structure
and dynamics
DNA, Protein,
Gene Expression,
Gene Networks
Population
![Page 21: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/21.jpg)
Relational Search
• Get a subset of the available data
• Define relations between categories of data
• Select by logical operations on relations
![Page 22: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/22.jpg)
Data Transformations
• Image analysis
• Measures on DNA sequences
• Measure on Protein sequences
• ...
![Page 23: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/23.jpg)
Mining
Classifier
Examples : DNA assembling, Protein and DNA homology, DNA philogeny,
genes characterizations of tissue, time pattern similarity
![Page 24: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/24.jpg)
Mining
• Attribute Space
x1
x2
x=(x1,x2)
![Page 25: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/25.jpg)
Mining
• Clustering
x1
x2
x1
x2
![Page 26: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/26.jpg)
Mining
x1
x2
x1
x2
• Classification
![Page 27: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/27.jpg)
Mining
• Attribute Space Dimension
x1
x2
![Page 28: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/28.jpg)
MiningDynamical System
Examples : gene networks, protein structure, cells, organisms, populations, drug
reaction
x1= y1
x2 = y2
t
t
initial
state u
u’Final state
Final state
![Page 29: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/29.jpg)
Mining
![Page 30: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/30.jpg)
Integration of Information
• different resolutions data classes
• transformed data
• selected data
• mined data
![Page 31: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/31.jpg)
Interpretation
• Integrated information
• Known concepts
• Propose hypothesis
• confirm or negate hypothesis
![Page 32: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/32.jpg)
A system for genetic data
analysis
• Database
• Analytical procedures
• Data mining
• High performance computing
![Page 33: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/33.jpg)
System
P1
P2
Pn
Pi : analytical and mining procedures (kernel parallel)
Objected oriented database
![Page 34: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/34.jpg)
System Architecture
DB/2DB/2
SybaseSybase
OracleOracle
InformixInformix
SliceSlice NNSliceSlice 11
Appl. n
Appl. 3
Appl. 2
Appl. 1
JavaJava
PBPB
DelphiDelphi
MATLAMATLA
N-2 slices
Consulte rules and libraries
![Page 35: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/35.jpg)
Data Warehouse
M_database
Analysis tools writer SpreadSheet Dataminig
MOLAP/ROLAP server
IMS OthersRDMS
Wrapper Wrapper Wrapper
Integrator
Metadata Data
Mart
Users
1
2
3
4
![Page 36: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/36.jpg)
Applications
• Cancer tissue characterization
• Cell cycle simulation
• Inference from clustering
• Gene regulation
![Page 37: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/37.jpg)
Cancer tissue characterization
• Problem: from a small set (20) of
microarrays, find a minimum number of
genes that are enough to separate cancer A
and B.
+bound
-bound
E[εn(d)]
εd
Mis
cla
ssif
icati
on
err
or
number of variables, d
( )dnε̂
![Page 38: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/38.jpg)
• Approach: randomize data, compute
classifier using genes subsets, measure error
for different dispersions, choose the subset
that balance small error and high dispersion.
A supercomputer is required.
![Page 39: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/39.jpg)
rk
• Linear classifier
• Dispersion centered in the sample
• Flat round dispersion model
• Error computed analytically (faster)
h
R
22hR −
Misclassification
error, εn(h,R)
Hyper-plane section: Vn-1
Xj [R1]
Xi [Rn-1]
xk
![Page 40: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/40.jpg)
• Robustness analysis
rj
rk
![Page 41: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/41.jpg)
2 4 6 8 10 12 14 16 18 200
0.02
0.04
0.06
0.08
0.1
0.12
# of variables, d
mis
cla
ssific
ation e
rorr
, ε
Error curves under various dispersion levels, σ
σ = 0.20
σ = 0.30
σ = 0.40
σ = 0.50
σ = 0.60
σ = 0.70
![Page 42: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/42.jpg)
−3.5 −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1−1.5
−1
−0.5
0
0.5
1
1.5
2
phosphofructokinase, platelet
v−
erb
−b2 a
via
n e
ryth
robla
stic leukem
ia v
iral oncogene h
om
olo
g 2
LINEAR CLASSIFIER (DISPERSED−GAUSSIAN) w/ σ = 0.600
1−th (0.389593)
7−th (0.416607)
BRCA1 BRCA2/sporadic
the tumor sample from Patient 20
![Page 43: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/43.jpg)
• Cell cycle simulation
External Signal
Receptors
Ingredients Control
p53
Division Steps
Excitation
Excitation or Inhibition
Inhibition
![Page 44: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/44.jpg)
Inference from clustering
• Examine the precision of sample-based
clustering relative to population inference
• Compare the number of replicates of
microarray experiments
• Compare the various clustering methods
![Page 45: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/45.jpg)
1 2 3 4 5 6 7-6
-4
-2
0
2
4
6
timelo
g2(r
atio)
time course data
C1
C2
C3
-40 -20 0 20 40 60 80 100 120 140-20
-10
0
10
20
30
40
50
60KL plot
C1
C2
C3
Dendrogram
Time course data
KL plotmultidimensional space
different views of different views of different views of
Microarray dataMicroarray dataMicroarray datamis-classified
Original clusters
Clustered by dendrogram
![Page 46: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/46.jpg)
Time Course Model
Class 1
[low variance]
measurement
Class 2
[high variance]
measurement
![Page 47: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/47.jpg)
Clustering algorithms
• K-means
• Fuzzy c-means
• Self Organizing Map
• Hierarchical (dendrogram)
![Page 48: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/48.jpg)
Clustering errors
• How clustering errors change as the number
of replicates increases?
• How differently each clustering algorithm
perform?
![Page 49: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/49.jpg)
K-means
0 5 10 15 20 25 30 35 40 45 500
10
20
30
40
50
60
70
80
90
100
# of replicates
# o
f m
iscla
ssific
ations
k-means
0.25
0.5
1.0
2.0
3.0
![Page 50: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/50.jpg)
Fuzzy c-means
0 5 10 15 20 25 30 35 40 45 500
10
20
30
40
50
60
70
80
90
100
# of replicates
# o
f m
iscla
ssific
ations
Fuzzy c-means
0.25
0.5
1.0
2.0
3.0
Clustering errorsClustering errorsClustering errorsClustering errors
![Page 51: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/51.jpg)
Self Organizing Maps
0 5 10 15 20 25 30 35 40 45 500
10
20
30
40
50
60
70
80
90
100
# of replicates
# o
f m
iscla
ssific
ations
SOMs
0.25
0.5
1.0
2.0
3.0
Clustering errorsClustering errorsClustering errorsClustering errors
![Page 52: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/52.jpg)
Variances of Data x Replicates
• The number of replicates required to get a
reasonable clustering result varies,
depending on the variance of gene
expression levels
• Clustering algorithm must also be chosen
correspondingly to get the best clustering
algorithm. No universal clustering
algorithm!
![Page 53: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/53.jpg)
No replicatevariance = 0.25
misclassifications
Tighter clusters due to small variance
Results from Fuzzy c-means
![Page 54: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/54.jpg)
No replicatevariance = 1.0
manymisclassifications
clusters start mixing
![Page 55: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/55.jpg)
No replicatevariance = 2.0
LOTS OFmisclassifications
Looser clusters due to large variance
![Page 56: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/56.jpg)
5 replicatesvariance = 0.25
NOmisclassifications
Much tighter clusters due to the replications
![Page 57: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/57.jpg)
5 replicatesvariance = 1.0
Very fewmisclassifications
Clusters well separated due to the replications and relatively small variance
![Page 58: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/58.jpg)
5 replicatesvariance = 2.0
Very fewmisclassifications
Clusters tightened due to the replications
![Page 59: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/59.jpg)
Real Data Application
• Initial clustering to generate templates
– means
– variance
• Simulate time course data based on the
templates generated by initial clustering
– different number of replicates
• Apply various clustering methods
– expected clustering error for each method
![Page 60: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/60.jpg)
5 templates by HC
-4
-2
0
2
428
-4
-2
0
2
412
-4
-2
0
2
4208
-4
-2
0
2
4194
-4
-2
0
2
4592
![Page 61: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/61.jpg)
Clustering errors on HC
0 2 4 6 8 10 12 14 16 18 200
5
10
15
20
25HC-EU-CL 5 templates(non-exclusive)
Repetition (N)
Err
or
(%)
FCM
K-means
SOM
HC-CC-CL
HC-EU-CL
![Page 62: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/62.jpg)
5 templates by FCM
-2
-1
0
1
2
130
-2
-1
0
1
2
166
-2
-1
0
1
2
134
-2
-1
0
1
2
330
-2
-1
0
1
2
274
![Page 63: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/63.jpg)
Clustering errors on FCM
0 2 4 6 8 10 12 14 16 18 200
5
10
15
20
25
30
35FCM 5 templates(non-exclusive)
Repetition (N)
Err
or
(%)
FCM
K-means
SOM
HC-CC-CL
HC-EU-CL
![Page 64: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/64.jpg)
Gene Regulation
• Problem: find the architecture of a gene
regulation network from microarray data.
• Approach: choose small subsets of genes (2,
3 or 4), design classifier, compute the
empirical error, choose the minimum error
classifier. A supercomputer is required.
![Page 65: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/65.jpg)
Target Gene Predictor 1
Predictor 2
NMSE 1NMSE 2
Predictor 3
NMSE 3
Gene Regulation
![Page 66: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/66.jpg)
Gene Regulation
x1 x2 p(-1,x1,x2) p(0,x1,x2) p(1,x1,x2) p(x1,x2) y Error
-1 -1 0.05 0.1 0.05 0.2 0 0.1
-1 0 0.03 0.03 0.04 0.1 1 0.06
-1 1 0.02 0.01 0.07 0.1 1 0.03
0 -1 0.01 0.01 0.03 0.05 1 0.02
0 0 0.03 0.01 0.01 0.05 -1 0.02
0 1 0.07 0.1 0.03 0.2 0 0.1
1 -1 0.04 0.06 0.1 0.2 1 0.1
1 0 0.03 0.01 0.01 0.05 -1 0.02
1 1 0.02 0.02 0.01 0.05 -1 0.03
0.48
![Page 67: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/67.jpg)
Gene regulation
Cell line Condition RC
H1
BC
L3
FR
A1
RE
L-B
AT
F3
IAP
-1
PC
-1
MB
P-1
SS
AT
MD
M2
p21
p53
AH
A
OH
O
IR MM
S
UV
ML-1 IR -1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0
ML-1 MMS 0 0 0 0 1 0 0 0 0 1 1 1 1 0 0 1 0
Molt4 IR -1 0 0 1 1 0 1 0 0 1 1 1 1 1 1 0 0
Molt4 MMS 0 0 1 0 1 0 0 0 0 0 1 1 1 0 0 1 0
SR IR -1 0 0 1 1 1 1 1 0 1 1 1 1 1 1 0 0
SR MMS 0 0 0 0 1 0 0 0 0 1 1 1 1 0 0 1 0
A549 IR 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 0 0
A549 MMS 0 0 0 0 1 0 0 0 0 0 1 1 1 0 0 1 0
A549 UV 0 0 0 0 1 0 0 0 0 0 1 1 1 0 0 0 1
MCF7 IR -1 0 1 1 0 0 0 0 0 1 1 1 0 1 1 0 0
MCF7 MMS 0 0 1 0 1 0 0 0 0 1 1 1 1 0 0 1 0
MCF7 UV 0 0 1 1 1 0 0 0 0 1 1 1 1 0 0 0 1
RKO IR 0 1 0 1 1 1 1 0 0 1 1 1 1 0 1 0 0
RKO MMS 0 0 0 0 1 0 0 0 0 0 1 1 1 0 0 1 0
RKO UV 0 0 0 0 1 0 0 0 0 0 1 1 1 0 0 0 1
CCRF-CEM IR -1 1 1 1 1 0 1 0 0 0 0 -1 -1 0 1 0 0
CCRF-CEM MMS 0 0 0 0 1 0 0 0 0 0 0 -1 0 0 0 1 0
HL60 IR -1 1 0 1 1 0 1 0 1 0 1 -1 -1 -1 1 0 0
HL60 MMS 0 0 1 0 1 0 0 0 0 1 1 -1 0 1 0 1 0
K562 IR 0 0 0 0 0 0 0 0 0 0 0 -1 0 0 1 0 0
K562 MMS 0 0 0 0 1 0 0 0 0 0 0 -1 0 0 0 1 0
H1299 IR 0 0 0 1 0 0 1 0 0 0 0 -1 0 0 1 0 0
H1299 MMS 0 0 0 0 1 0 0 0 0 0 1 -1 0 1 0 1 0
H1299 UV 0 0 0 0 1 0 1 0 0 0 1 -1 0 1 0 0 1
RKO/E6 IR -1 1 0 1 0 1 1 0 0 0 0 -1 -1 0 1 0 0
RKO/E6 MMS -1 0 0 0 1 0 0 0 0 0 1 -1 -1 1 0 1 0
RKO/E6 UV -1 0 0 0 1 0 0 0 0 0 1 -1 -1 1 0 0 1
T47D IR 0 0 0 1 0 0 0 0 0 0 1 -1 0 -1 1 0 0
T47D MMS 0 0 0 0 1 0 0 0 0 0 1 -1 0 1 0 1 0
T47D UV 0 0 0 0 1 0 0 0 0 0 1 -1 0 1 0 0 1
Rows are cell lines subjected to different experimental conditions.
Comparisons are to the same cell line not exposed to the experimental treatment.
Genes Condition
![Page 68: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/68.jpg)
Gene regulation
• split data in two parts: 2/3 and 1/3
• 2/3: training the predictor
• 1/3: empirical error measure
• create all predictors with less than 4 genes
and measure their empirical error
![Page 69: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/69.jpg)
Gene regulation
• repeat for 256 random splitting and take
their mean empirical error
• choose the predictors with error less than
75%
![Page 70: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/70.jpg)
Gene Regulation
AHA p53
RCH1
0.4760.203
PC1
0.153
OHO RCH1
BCL3
0.9930.743
MDM2
0.723
p53 p21
MDM2
0.8050.612
0.810
![Page 71: Molecular Biology - USPjb/lectures/bioinformatics/gene-amsterdam.… · Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo. Layout](https://reader035.vdocuments.net/reader035/viewer/2022063002/5f22772fc5e0827b02549c96/html5/thumbnails/71.jpg)
Gene Regulation
• some well known paths of the graph were
verified
• several unknown ones were suggested
• The possible new paths should be tested by
specific biochemical experiments