alternativa 1alternativa 2mattina (fariselli)pomeriggio (martelli) lun 1 ott svm (ml)grafi (sb) mar...
Post on 19-Dec-2015
219 views
TRANSCRIPT
Alternativa 1 Alternativa 2 Mattina (Fariselli) Pomeriggio (Martelli)
Lun 1 Ott Lun 1 Ott SVM (ML) Grafi (SB)
Mar 2 Ott Mar 2 Ott System Biology (SB) Probabilita (ML)
Mer 3 Ott Mer 3 Ott System Biology (SB) HMM (ML)
Ven 5 Ott Lun 8 Ott System Biology (SB) HMM (ML)
Lun 8 Ott Mar 9 Ott Linux Linux e python
Mar 9 Ott Mer 10 Ott Esercitazione su SVM
Mer 10 Ott Gio 11 Ott Esercitazione su HMM
I corsi vengono integrati e conterranno grosso modo due moduli:SB: systems biologyML: machine learning
A branch of science that seeks to integrate different levels of information to understand how biological systems function.
It is not (only) the number and properties of system elements but their relations!!
L. Hood: “Systems biology defines and analyses the interrelationships of all of the elements in a functioning system in order to understand how the system works.”
Systems Biology. What Is It?
More on Systems BiologyMore on Systems Biology
Essence of living systems is flow of mass,energy, and information in space and time.
The flow occurs along specific networks
Flow of mass and energy (metabolic networks)
Flow of information involving DNA (transcriptional regulation networks)
Flow of information not involving DNA (signaling networks)
The Goal of Systems Biology: To understand the flow of mass, energy, and information in living systems.
Networks and the Core Concepts of Systems Biology
(i) Complexity emerges at all levels of the hierarchy of life
(ii) System properties emerge from interactions of components
(iii) The whole is more than the sum of the parts.
(iv) Applied mathematics provides approaches to modeling biological systems.
How to Describe a System As a Whole?
Networks - The Language of Complex Systems
Air Transportation Network
The World Wide Web
Fragment of a Social Network(Melburn, 2004)
Friendship among 450 people in Canberra
Biological NetworksA. Intra-Cellular Networks
Protein interaction networks
Metabolic Networks
Signaling NetworksGene Regulatory Networks
Composite networksNetworks of Modules, Functional Networks Disease networks
B. Inter-Cellular NetworksNeural Networks
C. Organ and Tissue Networks
D. Ecological Networks
E. Evolution Network
The Protein Interaction Network of Yeast
Yeast two hybridUetz et al, Nature 2000
Source: ExPASy
Metabolic Networks
Gene Regulation Networks
protein-gene interactions
protein-protein interactions
PROTEOME
GENOME
Citrate Cycle
METABOLISM
Bio-chemical reactions
L-A Barabasi
miRNAregulation?
- -
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Functional Networks
Number of shared proteins
20125
55740
43221
33419
28692
12160
20147
7
83 65
41 49
172 75
321
24 260
11
596
35
103
Cell Cycle Cell Polarity & Structure
Intermediateand EnergyMetabolism
Protein Synthesisand Turnover
Protein RNA / Transport
RNAMetabolism
Signaling
Transcription/DNAMaintenance/Chromatin Structure
Number of protein complexes
Number of proteins 13111
8 6125 40
77 19 14
97
30 16 27
11
75 299
53
37
19
7 15
22187
33 73
13
94
MembraneBiogenesis &Turnover
Yeast: 1400 proteins, 232 complexes, nine functional groups of complexes
(Data A.-M. Gavin et al. (2002) Nature 415,141-147)
D. Bonchev, Chemistry & Biodiversity 1(2004)312-326
What is a Network?
Network is a mathematical structure composed of points connected by lines
Network Theory <-> Graph Theory
Network Graph
Nodes Vertices (points)
Links Edges (Lines)
A network can be build for any functional system
System vs. Parts = Networks vs. Nodes
The 7 bridges of Königsberg
The question is whether it is possible to walk with a route that crosses each bridge exactly once.
The representation of Euler
The shape of a graph may be distorted in any way without changing the graph itself, so long as the links between nodes are unchanged. It does not matter whether the links are straight or curved, or whether one node is to the left or right of another.
In 1736 Leonhard Euler formulated the problem in terms of abstracted the case of Königsberg:
1) by eliminating all features except the landmasses and the bridges connecting them;
2) by replacing each landmass with a dot (vertex) and each bridge with a line (edge).
3
3
5 3
The solution depends on the node degree
In a continuous path crossing the edges exactly once, each visited node requires an edge for entering and a different edge for exiting (except for the start and the end nodes).
A path crossing once each edge is called Eulerian path.It possible IF AND ONLY IF there are exactly two or zero nodes of odd degree. Since the graph corresponding to Königsberg has four nodes of odd degree, it cannot have an Eulerian path.
The solution depends on the node degree
If there are two nodes of odd degree, those must be the starting and ending points of an Eulerian path.
2
3
4
5
1
Start
6
End
Hamiltonian paths
Find a path visiting each node exactly one
Conditions of existence for Hamiltonian paths are not simple
Hamiltonian paths
Graph nomenclature
Graphs can be simple or multigraphs, depending on whetherthe interaction between two neighboring nodes is unique or can be multiple, respectively.
A node can have or not self loops
Graph nomenclature
Networks can be undirected or directed, depending on whether the interaction between two neighboring nodes proceeds in both directions or in only one of them, respectively.
The specificity of network nodes and links can be quantitatively characterized by weights
2.5
2.5
7.3 3.3 12.7
8.1
5.4
Vertex-Weighted Edge-Weighted
1 2 3 4 5 6
Networks having no cycles are termed trees. The
more cycles thenetwork has, the more complex it
is.
A network can be connected (presented by a
single component) or disconnected (presented by
several disjoint components).
connected disconnected
trees
cyclic graphs
Graph nomenclature
Paths
Stars
Cycles
Complete Graphs
Graph nomenclature
Large graphs = Networks
• Vertex degree distribution (the degree of a vertex is the number of vertices connected with it via an edge)
Statistical features of networks
• Clustering coefficient: the average proportion of neighbours of a vertex that are themselves neighbours
Statistical features of networks
Node4 Neighbours (N)
2 Connections among the Neighbours
6 possible connections among the Neighbours(Nx(N-1)/2)
Clustering for the node = 2/6
Clustering coefficient: Average over all the nodes
• Clustering coefficient: the average proportion of neighbours of a vertex that are themselves neighbours
Statistical features of networks
C=0
C=0
C=0
C=1
Given a pair of nodes, compute the shortest path between them
• Average shortest distance between two vertices
• Diameter: maximal shortest distance
Statistical features of networks
How many degrees of separation are they between two random people in the world, when friendship networks are considered?
How to compute the shortest path between home and work?
Edge-weighted Graph
The exaustive search can be too much time-consuming
The Dijkstra’s algorithm
Initialization:Fix the distance between “Casa” and
“Casa” equal to 0Compute the distance between “Casa” and
its neighboursSet the distance between “Casa” and its
NON-neighbours equal to ∞
Fixed nodesNON –fixed nodes
The Dijkstra’s algorithm
Iteration (1):Search the node with the minimum
distance among the NON-fixed nodes and Fix its distance, memorizing the incoming direction
Fixed nodesNON –fixed nodes
4
Iteration (2):
Update the distance of NON-fixed nodes, starting from the fixed distances
The Dijkstra’s algorithm
Fixed nodesNON –fixed nodes
The Dijkstra’s algorithm
Fixed nodesNON –fixed nodes
The updated distance is different from the previous one
Iteration:
Fix the NON-fixed nodes with minimum distance
Update the distance of NON-fixed nodes, starting from the fixed distances.
The Dijkstra’s algorithm
Fixed nodesNON –fixed nodes
Iteration:
Fix the NON-fixed nodes with minimum distance
Update the distance of NON-fixed nodes, starting from the fixed distances.
The Dijkstra’s algorithm
Fixed nodesNON –fixed nodes
Iteration:
Fix the NON-fixed nodes with minimum distance
Update the distance of NON-fixed nodes, starting from the fixed distances.
The Dijkstra’s algorithm
Fixed nodesNON –fixed nodes
Iteration:
Fix the NON-fixed nodes with minimum distance
Update the distance of NON-fixed nodes, starting from the fixed distances.
The Dijkstra’s algorithm
Fixed nodesNON –fixed nodes
Iteration:
Fix the NON-fixed nodes with minimum distance
Update the distance of NON-fixed nodes, starting from the fixed distances.
The Dijkstra’s algorithm
Fixed nodesNON –fixed nodes
Conclusion:
The label of each node represents the minimal distance from the starting node
The minimal path can be reconstructed with a back-tracing procedure
• Average shortest distance between two vertices
• Diameter: maximal shortest distance
Statistical features of networks
• Vertex degree distribution
• Clustering coefficient
Two reference models for networks
Regular network (lattice) Random network (Erdös+Renyi, 1959)
Each edge is randomly set with probability p
Regular connections
Two reference models for networksComparing networks with the same number of nodes (N) and edges
Degree distribution
Poisson distribution
!k
ekPk
Exp decay
Average shortest path
Average connectivity
≈ N ≈ log (N)
high low
Some examples for real networks
Network size vertex degree
shortest path
Shortest path in fitted random graph
Clustering
Clustering in random graph
Film actors 225,226 61 3.65 2.99 0.79 0.00027
MEDLINE coauthorship
1,520,251
18.1 4.6 4.91 0.43 1.8 x 10-4
E.Coli substrate graph
282 7.35 2.9 3.04 0.32 0.026
C.Elegans neuron network
282 14 2.65 2.25 0.28 0.05
Real networks are not regular (low shortest path)Real networks are not random (high clustering)
Adding randomness in a regular network
Random changes in edges
OR
Addition of random links
Adding randomness in a regular network(rewiring)
Networks with high clustering (like regular ones) and low path length (like random ones) can be obtained:SMALL WORLD NETWORKS (Strogatz and Watts, 1999)
Small World Networks
A small amount of random shortcuts can decrease the path length, still maintaining a high clustering: this model “explains” the 6-degrees of separations in human friendship network
What about the degree distribution in real networks?
Both random and small world models predict an approximate Poisson distribution: most of the values are near the mean;Exponential decay when k gets higher: P(k) ≈ e-k, for large k.
What about the degree distribution in real networks?
In 1999, modelling the WWW (pages: nodes; link: edges), Barabasi and Albert discover a slower than exponential decay:
P(k) ≈ k-a with 2 < a < 3, for large k
Scale-free networks
Networks that are characterized by a power-law degree distribution are highly non-uniform: most of the nodes have only a few links. A few nodes with a very large number of links, which are often called hubs, hold these nodes together. Networks with a power degree distribution are called scale-free
hubs
It is the same distribution of wealth following Pareto’s 20-80 law:Few people (20%) possess most of the wealth (80%), most of the people (80%) possess the rest (20%)
Hubs
Attacks to hubs can rapidly destroy the network
Three non biological scale-free networks
Albert and Barabasi, Science 1999
Note the log-log scaleLINEAR PLOT
kAkPkAkP loglog)(log)(
How can a scale-free network emerge?
Network growth models: start with one vertex.
How can a scale-free network emerge?
Network growth models: new vertex attaches to existing vertices by preferential attachment: vertex tends choose vertex according to vertex degree
In economy this is called Matthew’s effect: The rich get richerThis explain the Pareto’s distribution of wealth
How can a scale-free network emerge?
Network growth models: hubs emerge(in the WWW: new pages tend to link to existing, well linked pages)
Metabolic pathways are scale-free
Hubs are pyruvate, coenzyme A….
Protein interaction networks are scale-free
Degree is in some measure related to phenotypic effect upon gene knock-out
Red : lethalGreen: non lethalYellow: Unknown
Titz et al, Exp Review Proteomics, 2004
Caveat: different experiments give different results
How can a scale-free network emerge?
Gene duplication (and differentiation): duplicated genes give origin to a protein that interacts with the same proteins as the original protein (and then specializes its functions)
Caveat on the use of the scale-free theory
The same noisy data can be fitted in different ways
Finding a scale-free behaviour do NOT imply the growth with preferential attachment mechanism
A sub-net of a non-free-scale network can have a scale-free behaviour
Keller, BioEssays 2006
Hierarchical networks
Standard free scale models have low clustering: a modular hierarchical model accounts for high clustering, low average path and scale-freeness
Sub-graphs more represented than expected
Modules
209 bi-fan motifs found in the E.coli regulatory network
Summary
Many complex networks in nature and technology have common features.
They differ considerably from random networks of the same size
By studying network structure and dynamics, and by using comparative network analysis, one can get answers of important biological questions.
Fundamental Biological Questions to Answer
(i) Which interactions and groups of interactions are likely to have equivalent functions across species? (ii) Based on these similarities, can we predict new functional information about proteins and interactions that are poorly characterized?(iii) What do these relationships tell us about the evolution of proteins, networks and whole species? (iv) How to reduce the noise in biological data: Which interactions represent true binding events?
False-positive interaction is unlikely to be reproduced across the interaction maps of multiple species.
Barabasi and Oltvai (2004) Network Biology: understanding the cell’s functional organization. Nature Reviews Genetics 5:101-113
Stogatz (2001) Exploring complex networks. Nature 410:268-276
Hayes (2000) Graph theory in practice. American Scientist 88:9-13/104-109
Mason and Verwoerd (2006) Graph theory and networks in Biology
Keller (2005) Revisiting scale-free networks. BioEssays 27.10: 1060-1068