6.096 – algorithms for computational biology lecture 12 biological networks microarrays –...
TRANSCRIPT
![Page 1: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/1.jpg)
6.096 – Algorithms for Computational BiologyLecture 12
Biological NetworksMicroarrays – Expression Clustering – Bayesian nets – Small-world nets
![Page 2: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/2.jpg)
Biological networks
Lecture 1 - Introduction
Lecture 2 - Hashing / BLAST
Lecture 3 - Combinatorial Motif Finding
Lecture 4 - Statistical Motif Finding
Lecture 5 - Sequence alignment and Dynamic Programming
Lecture 6 - RNA structure and Context Free Grammars
Lecture 7 - Gene finding and Hidden Markov Models
Lecture 8 - HMMs algorithms and Dynamic Programming
Lecture 9 - Evolutionary change, phylogenetic trees
Lecture 11 - Genome rearrangements, genome duplication
Lecture 12 - Biological networks, expression clustering, small worlds
6.096 – Algorithms for Computational Biology – Lecture 9
![Page 3: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/3.jpg)
Challenges in Computational Biology
DNAGene FindingRegulatory motif discovery
Database lookup
Gene expression analysis
RNA transcript
Sequence alignment
Evolutionary Theory
TCATGCTATTCGTGATAATGAGGATATTTATCATATTTATGATTT
Cluster discovery Gibbs sampling Biological networks12
Emerging network properties14
13 Regulatory network inference
Comparative Genomics
RNA folding
![Page 4: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/4.jpg)
Outline
Microarray technology
Clustering gene expression
TF binding: the controllers
Bayesian networks
Network properties
Scale-free networks
![Page 5: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/5.jpg)
The idea behind DNA MicroArrays
• To measure levels of messages in a cell– Construct an array with DNA sequences for 6000 genes– Hybridize each RNA in your sample to a sequence in your
array (All sequences from the same gene hybridize to the same spot)
– Measure the number of hybridizations for each spot
Gen
e 1
Gen
e 3
Gen
e 5
Gen
e 6
Gen
e 4
Gen
e 2
Gene 1
Gene 4
Gene 6
Hybridize Gen
e 1
Gen
e 3
Gen
e 5
Gen
e 6
Gen
e 4
Gen
e 2
Measure
![Page 6: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/6.jpg)
The first RNA expression observation
• Which colonies have expressed a particular gene?
Petri dish withbacterial colonies
Apply membrane and lift to make a filter replica containing DNA from each cDNA clone.
Probe with radioactively labeled DNA probeand image with film or phosphorimager to identify clones homologous to the probe.
Plastic Bag containing Hybridization Solution and Filter
![Page 7: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/7.jpg)
Today
• 6000 genes instead of one
• Entire transcriptome observable in one experiment
![Page 8: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/8.jpg)
Two ways of making DNA Arrays• Printed slides (Stanford)
– PCR amplification of a 1kb portion of the gene of interest (3’most)– Robotically apply each amplified sample on glass slide– Chemically attach DNA to glass and thermally denature– 6000 spots on 2x2cm glass
• DNA chips (Affymetrix)– Each gene provides several neighboring spots on array– Sampled from various regions within gene (most unique)– Synthesize oligonucleotides (20b) growing from glass
gene1gene2
gene3
1000 bases
gene1gene2
gene3
One measurement per gene
Many measurements per gene. average
20 bases
![Page 9: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/9.jpg)
Printed arrays: fixing DNA onto glass
Procedure
1. Array samples onto appropriate spot in glass array
2. Air-dry to fix the samples
3. UV-irradiation forms covalent bonds between T in DNA and positively charged amine groups on the silane slides
![Page 10: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/10.jpg)
DNA chips: Growing oligos on glass• Making an oligonucleotide
add bind wash unprotect add bind wash unprotectadd
protectedwash
unprotectadd
protectedwash
unprotect
• Making a DNA chip (affymetrix.com)
![Page 11: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/11.jpg)
Fluorescently labeling the samples
• From RNA product, reverse transcribe to cDNA– Use oligo-dT primer, dNTPs, and a low concentration of
nucleotide analog labeled with fluorescent dye– After hybridization step, wash away unhybridized cDNA– Measure intensity of fluorescence
DN
A 1
DN
A 3
DN
A 5
DN
A 6
DN
A 4
DN
A 2
cDNA 1
cDNA 4
cDNA 6
Hybridize Gen
e 1
Gen
e 3
Gen
e 5
Gen
e 6
Gen
e 4
Gen
e 2
Measure
RNA 1
RNA 4
RNA 6
RT+Label
![Page 12: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/12.jpg)
Printed Array: Comparing two conditions
![Page 13: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/13.jpg)
DNA chips: measuring absolute expression
• Compare perfect match affinity to mismatch affinity
![Page 14: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/14.jpg)
DNA chips: measuring absolute expression
• Statistical models yield gene expression value– Separate signal from noise. Estimate noise from
multiple observations. Measure uncertainty of each sample.
Perfect Match----------------- = Expressionprobe i
Mismatch
Avg(Exprprobe i)= Expressiongene I
probes
![Page 15: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/15.jpg)
Modeling noise
• Sources of Noise– Cross-hybridization– Improper probe sequence– Non-uniform hybridization kinetics– Non-linearity of array response to concentration– Non-linear amplification
• Estimating gene expression value– Confidence intervals estimation– Model is non-gaussian– ML, MAP estimation– See Gifford, Young, Jaakkola
![Page 16: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/16.jpg)
Outline
Microarray technology
Clustering gene expression
TF binding: the controllers
Bayesian networks
Network properties
Scale-free networks
![Page 17: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/17.jpg)
The problem
• Group genes into co-regulated sets– Observe cells under different environmental changes– Find genes whose expression profiles are affected in
a similar way– These genes are potentially co-regulated, i.e.
regulated by the same transcription factor
![Page 18: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/18.jpg)
Expression Profiles
• Each coregulated set has a distinct expression profile– Expression levels help group genes in a meaningful way– In each group, genes have similar expression profiles– Shown here are six expression profiles for an experiment. For
each of the 18 time steps, the mean and spread of the expression levels in each group are shown
ExpressionLevel
Time
![Page 19: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/19.jpg)
Clustering expression levels
• Computational problem: – How do we go from expression levels of 6000
genes to meaningful categories of possibly co-regulated genes
• Method:1. Signal to noise separation. Data normalization. 2. Feature extraction. Choose features to compare. 3. Clustering. Group genes into meaningful subsets. 4. Evaluation. Statistical significance of a grouping.
Signal to Noise separation Feature Extraction
x1
x2
x3
x4
x5
x2
Clustering
x1
x3
![Page 20: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/20.jpg)
• Goal: Combining expression data across experiments– Last lecture: differential expression guaranteed same conditions– But values can vary with time, lab, concentrations, chemicals
used– How do we compare expression values across experiments
• Noise model
– yij: observed level for gene j on chip i. tij: true level– cj: gene constant. ni: multiplicative chip normalization– aij, eij: multiplicative and additive noise terms
• Estimating the parameters– ni: spiked in control probes, not present in genome studied– cj: control experiments of known concentrations for gene j– eij: un-spiked control probes should be zero– aij: spiked controls that are constant across chips
1. Expression Value Normalization
])([ ijijjijiij tcny
![Page 21: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/21.jpg)
2. Feature extraction• Select values which yield maximal group separation
– When clustering cell type, select most relevant genes– In clustering genes, select most meaningful conditions
• Pre-process input– Instead of clustering on direct observation of expression values…– … can cluster based on differential expression from the mean– … or differential expression normalized by standard deviation
• Sample correlation– 1&3: Absolute levels can be different, yet genes coregulated– 1&2: Absolute levels can be similar, yet genes unrelated
Exp
ress
ion
Lev
el
time
Nchips
iii yx
1
Gene1Gene1
Gene2Gene2
Gene3Gene3
Nchips
i y
i
x
i
s
yy
s
xx
1
p
ii
p
ii
i
Nchips
ii
yyxx
yyxxyxs
1
2
1
2
1
)()(
)()(),(
![Page 22: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/22.jpg)
3. Clustering Algorithms
• Hierarchical: Split data successively to construct tree
b
ed
f
a
c
h
ga b d e f g hc
• Non-Hierarchical: place k-means to best explain data
b
ed
f
a
c
h
gc1
c2
c3a b g hcd e f
![Page 23: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/23.jpg)
3a. Hierarchical clustering
• Bottom-up clustering. Greedy algorithm– Construct minimum spanning tree
• Pre-processing– Create a cluster for every data point
• Iterative greedy algorithm– Find two nearest clusters– Merge them into a single cluster
• Termination– Return tree of joins
• O(n2) comparisons + merge at every step. O(n) steps
• Computational cost tradeoffs depends on: – Cluster representation, distance metric, merging algorithm
b
ed
f
a
c
h
g
a b d e f g hc
![Page 24: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/24.jpg)
b
ed
f
a
c
h
g
c1
c2c3
3b. K-means clustering
• Iterative algorithm: optimizing random solution– If assignments of points is known, centers easily computed– If centers are known, assignment of points easily computed
• General case: Expectation Maximization (EM algorithm)– In k-means clustering, every point belongs to only one center– In general case, probabilistic model. Every point belongs to all
centers, to each with probability proportional to distance.
b
ed
f
a
c
h
g
Assign points to centers Update centersCreate random centers
c1
c2c3 b
ed
f
a
c
h
g
c1
c2
c3
![Page 25: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/25.jpg)
3c. Cluster Representation
• Operations supported by cluster data structure– Hierarchical clustering
• point2cluster(point) cluster• clusters_distance(cluster1,cluster2) distance• clusters_merge(cluster1,cluster2) new_cluster
– K-means clustering• initialize k clusters• points2cluster(points) cluster• cluster_distance(cluster,point) distance
• Different representations optimize different metrics– Mean of cluster points & number of points in cluster
• Recompute mean when a new point is added to cluster by weighing appropriately
![Page 26: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/26.jpg)
3d. Cluster distance metrics
• Expression level distances for clusters not genes– We already talked about different metrics for
comparing individual genes. Differential expression. Correlation.
– Now we define dcluster in terms of dgene
• Defining the distance between two clusters – Single-link method:
ed
f
h
g
ed
f
h
g
ed
f
h
g
– Centroid method:
ed
f
h
g
– Complete-link method:
– Average-link method:
![Page 27: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/27.jpg)
Depending on application, different methods best fit
![Page 28: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/28.jpg)
Other clustering methods
• Hierarchical– Complete tree structure, but not number of clusters– Greedy, hence depends heavily on initial partitioning
(two genes that are very similar can be separated)• K-means clustering
– Need good estimate of the number of clusters– Random assignment of initial centers can bias result
• Parametric methods– Model entire density distribution of space. Fit
models• Self-organizing maps
– Make additional assumptions about geometry of clusters
![Page 29: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/29.jpg)
Evaluating clustering output• Computing statistical significance of clusters
rm
k
Nmk
n
m
p
rposP )(
• N experiments, p labeled ++, (N-p) ––
• Cluster: k elements, m positive
• P-value of single cluster containing k elements out of which r are same
Prob that a randomly chosen set of k experiments would result in m positive and k-m negative
P-value of uniformity
in computed cluster
![Page 30: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/30.jpg)
Visualizing clustering output
![Page 31: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/31.jpg)
Rearranging tree branches
• Optimizing one-dimensional ordering of tree leaves
a b g hcd e f
a b d e f g hc
![Page 32: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/32.jpg)
O(n3) algorithm for leaf ordering
• All possible orderings: O(2n) orderings on n leaves
• Algorithmic improvements: – Divide-and-conquer algorithm partitions tree– Branch-and-bound allows early termination– Works on k-ary trees
Bar-Joseph et al. Bioinformatics 2003
![Page 33: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/33.jpg)
What have we learned?
• What have we done?– Took expression values– Normalized them– Clustered them
• What have we obtained?– List of possibly co-regulated genes
• What is missing?– Causality: Identify transcription factor(s) responsible
for the observed co-regulation – Molecular basis: what promoter sequences are
recognized by these transcription factors
![Page 34: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/34.jpg)
Outline
Microarray technology
Clustering gene expression
TF binding: the controllers
Bayesian networks
Network properties
Scale-free networks
![Page 35: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/35.jpg)
The question
• Which factor binds to which upstream region– Gives causality of regulation. – Initial networks can be built
• Correlating the binding location data with regulation– Able to draw single-link connections
![Page 36: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/36.jpg)
Footprint experiments
• Most direct observation of binding– Protection assay: digest
nucleotides that are not protected by the presence of the transcription factor
– This is how molecular interactions of DNA and regulatory proteins was first described
– Gives the exact sequence at binding site
![Page 37: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/37.jpg)
Chromatin IP (ImmunoPrecipitation)
• Tag transcription factors– Create anti-body for
transcription factor protein– Able to pull factor out of a
solution• Bind intergenic region of
interest– Collect transcription factor
using the specificity of the anti-body
– Along with TF comes the intergenic region that it binds
• Measure levels of each region
![Page 38: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/38.jpg)
Location Analysis
• Chromatin IP on a chip– Use microarray technology to
measure the levels of each intergenic region
– Two samples: labeled differently, one enriched in intergenic regions bound by the particular transcription factor
– Measure levels of each intergenic region on a special chip
• Genomic scale– In one experiment, we can observe
the binding on every intergenic region of an entire genome
– However: need one experiment for each transcription factor
![Page 39: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/39.jpg)
Outline
Microarray technology
Clustering gene expression
TF binding: the controllers
Bayesian networks
Network properties
Scale-free networks
![Page 40: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/40.jpg)
Modeling the dependencies
• Binding and Regulation– Regulation data depends on presence binding– Location data depends on binding but also other factors
• Conservation data– Multiple species provide extra predictive power– However, species observations are not independent– Dependencies modeled with a phylogenetic tree
• Binding and motif conservation– The conservation of a regulatory motif, and the binding of the
factor specific to that motif are dependent on functionality of motif
• Environmental factors– Binding may occur only in some conditions, not in others
![Page 41: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/41.jpg)
Bayesian network topology
![Page 42: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/42.jpg)
Galactose regulation
![Page 43: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/43.jpg)
Evaluating Alternative Hypotheses
![Page 44: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/44.jpg)
Hartemink et al.
Scoring Bayesian models
![Page 45: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/45.jpg)
Model Comparison
![Page 46: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/46.jpg)
Scoring all possible models
• Combinatorially many models– Score variations point to models that best explain data
![Page 47: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/47.jpg)
Summary: Inferring regulatory networks
Expression Clustering– Microarray technology allows genome-wide measurements– Cluster co-regulated genes according to expression
patterns
Location analysis– Determine intergenic regions of TF binding
– Scan identified regions for common motifs
Bayes Networks– Evaluate alternative hypotheses
– Select network topology
![Page 48: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/48.jpg)
Regulatory Networks: Example
![Page 49: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/49.jpg)
Outline
Microarray technology
Clustering gene expression
TF binding: the controllers
Bayesian networks
Network properties
Scale-free networks
![Page 50: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/50.jpg)
Recurring network motifs
• What are common patterns of interconnectivity?
feed-back feed-forward fan out sink
![Page 51: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/51.jpg)
Geodesic distance
• What is the shortest path between any two nodes?
• What is the diameter of the network?
• How many connected components are there?
• What is the size of the largest component?
![Page 52: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/52.jpg)
Clustering coefficient
• How likely are my friends to know each other?
C=1C=0
![Page 53: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/53.jpg)
In a highly clustered, ordered network, a single random connection will create a shortcut that lowers L dramatically
Watts demonstrates that small world properties can occur in graphs with a surprisingly small number of shortcuts
Small world networks
• High clustering coefficient
• Small path lengths
![Page 54: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/54.jpg)
Emergence of small-world phenomenon
![Page 55: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/55.jpg)
Degree distribution
• What’s the average number of friends anyone has
• Is this average representative of a typical person
![Page 56: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/56.jpg)
Scale Free Networks
• DEFINITION: Scale-free networks, including the Internet, are characterized by an uneven distribution of connectedness. Instead of the nodes of these networks having a random pattern of connections, some nodes act as "very connected" hubs, a fact that dramatically influences the way the network operates.
• Barabasi and his colleagues mapped the connectedness of
the Web. • Their experiment yielded a connectivity map that they called
"scale-free".
![Page 57: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/57.jpg)
Scale Free vs. Random
Random networks suffer from random failures because each node important as any other
"scale free" networks are more immune to random failure due to the redundancy of paths linking nodes
connectivity ensured by few highly connected nodes
"scale free" networks are prone to catastrophic failure when key "hubs"are attacked
![Page 58: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/58.jpg)
Yeast protein-protein interaction network
• High clustering coefficient / Short paths
![Page 59: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/59.jpg)
Scale-free networks are ubiquitous
• Web pages
• Internet routers
• Airports
• Power grid
• Social networks
• Boards of directors
• Scientific co-authorship
• Medline citations
• US patents
• Movie database
• Metabolic networks• Protein interactions• Regulatory networks• Predator-prey
networks• Neuron connections• Blood vessels
![Page 60: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/60.jpg)
Scientific authorship
• Hubs play central role in network connectivity
• Small number of cross-cluster interconnections
![Page 61: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/61.jpg)
How do scale-free networks emerge?
(a) constructed by laying down N nodes and connecting each pair with probability p. This network has N = 10 and p = 0.2.
(b) A new node (red) connects to two existing nodes in the network (black) at time t + 1. This new node is much more likely to connect to highly connected nodes, a phenomenon called preferential attachment.
(c) The network connectivity can be characterized by the probability P(k) that a node has k links. For random graphs P(k) is strongly peaked at k = <k> and decays exponentially for large k.
(d) A scale-free network does not have a peak in P(k), and decays as a power law P(k) ~ k g at large k.
(e) A random network - most nodes have approximately the same number of links.
(f) The majority of nodes in a scale-free network have one or two links, but a few nodes (hubs) have a large Number of links; this guarantees that the system is fully connected
![Page 62: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/62.jpg)
Scale-free networks from bi-partite graphs
• Person belongs to multiple social groups
• Protein acts in multiple functional categories
• Author publishes to multiple fields
• Loose connections from group membership
![Page 63: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/63.jpg)
Implications of scale-free networks
• Hubs become important– Random networks are subject to random failures– Scale-free networks are unlikely to lose a hub– Scale-free networks subject to directed attacks
• Biological implication– Essential proteins in yeast often correspond to hubs
![Page 64: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/64.jpg)
![Page 65: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/65.jpg)
Two types of hubs
• “Date hubs”– Interconnections at
different times• “Party hubs”
– Interconnections are coordinated
• Different effects on network connectivity– Date hubs bring
together distinct components of the network
![Page 66: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/66.jpg)
Outline
Microarray technology
Clustering gene expression
TF binding: the controllers
Bayesian networks
Network properties
Scale-free networks
![Page 67: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/67.jpg)
![Page 68: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/68.jpg)
![Page 69: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/69.jpg)
![Page 70: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/70.jpg)
Regulatory Networks: Approaches
• Expression– Finding possibly co-regulated genes by– Clustering expression profiles
• Location– Which intergenic regions does a transcription factor
bind– By chromatin immunoprecipitation
• Conservation– Which sequence elements are conserved?– Which genes share conserved sequence elements?
• Integration– Bayesian networks for testing alternative hypotheses
![Page 71: 6.096 – Algorithms for Computational Biology Lecture 12 Biological Networks Microarrays – Expression Clustering – Bayesian nets – Small-world nets](https://reader036.vdocuments.net/reader036/viewer/2022081501/56649c9c5503460f9495bb40/html5/thumbnails/71.jpg)
Regulatory Networks: Goal
• Understand – the molecular basis for – all transcription regulatory interactions – between every transcription factor, – the intergenic elements it recognizes– the genes it controls, – and the signal pathways it is involved in– at a genomic scale
• Reconstruct global regulatory networks