modelling and visualising biological systems - falk schreiber
DESCRIPTION
Two topics will be discussed in this tutorial: (1) constraint-based modelling of metabolic systems using Flux Balance Analysis (FBA) and (2) standardised visual representation of cellular processes and biological networks using the Systems Biology Graphical Notation (SBGN).TRANSCRIPT
Falk Schreiber
Modelling and Visualising
Biological Systems
Bioinformatics
IPK Gatersleben
Institute of Computer Science
Martin Luther University
Halle-Wittenberg
IPK Gatersleben & MLU Halle-Wittenberg
Outline
1. Modelling metabolism
- Basics
- Constraint-based modelling: FBA
- Mathematical representation
- Application of constraints
- Example
- Resources and tools
2. Visualising models and networks
- Basics
- Standard graphical representation
- Process Description Language
- Resources and tools
Metabolic Modelling
Comprises the reconstruction,
simulation, and analysis of
metabolic models
Metabolic model
list of reactions and
associated properties,
assumed to be present in the
system under investigation,
along with
description of the environment
within which the biological
system is assumed to reside
Provides a basis for system-level
analysis of metabolism for
different organisms
Source: http://www.hydroponicist.com/pages/p69-oxygen-air.htm
Methods in Metabolic Modelling M
od
el s
ize
Mo
de
l d
eta
ils
+ kinetic rate laws
+ kinetic parameters
+ metabolite
concentrations
Topology
only network
structure
Petri Nets
+ stoichiometric
constraints
+ thermodynamics
Flux
Balance
+ mass balance
+ capacity
constraints
Kinetic
Flux Balance Analysis
Constraint-based stoichiometric modelling approach to predict
and analyse the metabolic steady state conversion rates (fluxes)
Advantages
No kinetic parameters required
Quantitative predictions
Applicable to large systems
Applications
Prediction of optimal metabolic yields and flux distributions
Prediction of phenotype/viability of knockout-mutants
Prediction of pathway redundancies
…
History of FBA
Principles of Flux Balance Analysis
Simulation
Oxygene level
Reaction Network Formalism
b: exchange fluxes
v: internal fluxes plast
b
plast
b
b
ext
v
v
v
BBR
CCR
AAR
CBR
CAR
BAR
3
6
5
4
3
2
1
:
:
:
:
:
:
2
1
3
2
1
Stoichiometric Matrix
010110
100101
001011
654321
C
B
A
RRRRRR
plast
b
plast
b
b
ext
v
v
v
BBR
CCR
AAR
CBR
CAR
BAR
3
6
5
4
3
2
1
:
:
:
:
:
:
2
1
3
2
1
Stoichiometric Matrix
Stoichiometric Matrix
010110
100101
001011
654321
C
B
A
RRRRRR
plast
b
plast
b
b
ext
v
v
v
BBR
CCR
AAR
CBR
CAR
BAR
3
6
5
4
3
2
1
:
:
:
:
:
:
2
1
3
2
1
Dynamic Mass Balance
vSdt
dM
S
v
Mass balance equations Matrix form
b: exchange fluxes
v: internal fluxes
Steady State
Steady state mass balance
Steady state assumption
0dt
dM0vSthus
Metabolic Modelling
Mass balance:
Thermodynamic: directionality of reaction
Capacity: enzymatic capacity,
nutrient availability
Constraints
FluxB
Flu
xC
Feasible
solution space
FluxB
Flu
xC
iii v
0 vSdt
dM
iv0
Metabolic Modelling
Optimization problem: maximize/minimize Z
Solved using linear programming
FluxB
Flu
xC
FluxB
Flu
xC
Optimization
Max./Min. Z Optimal solution
Direction of
increasing Z
Feasible
solution space
Example
System of two metabolites A and B
Production constrains
0 < A < 60 and 0 < B < 50
Capacity for simultaneous production
A + 2B < 120
Objective function
Z = 20A + 30B
FluxA
Flu
xB
50
60
Feasible set
A + 2B = 120
FluxA
Flu
xB
50
60
Feasible set Z = 2100
Z = 20A + 30B
Z = 1500
Optimal value within
feasible set
Linear Programming: Types of Solutions
Objective Function
How to identify plausible physiological states?
Question Objective
What are the biochemical production
capabilities?
Maximize metabolite product
What is the maximal growth rate and
biomass yield?
Maximize growth rate
What is the trade-off between
biomass production and metabolite
overproduction?
Maximize biomass production for
a given metabolite production
How energetically efficient can
metabolism operate?
Minimize ATP production or
minimize nutrient uptake
Model Simulation and Analysis
Flux balance analysis
Robustness analysis
Knockout analysis
Flux variability analysis
Yield / flux
predictions
under varying
environmental
conditions
- multi parameter
variation
Yield / flux
predictions under
varying genetic
backgrounds
- complete
- specified
Obj. function
sensitivity to flux
variation of
specific reaction
- complete
- specified
Predictions of
min/ max flux
values
- complete
Objective Function: Growth Objective
Objective Function: Growth Objective
Metabolite Demand (mmol)
ATP 41.2570
NADH -3.5470
NADPH 18.2250
G6P 0.2050
F6P 0.0709
R5P 0.8977
E4P 0.3610
T3P 0.1290
3PG 1.4960
PEP 0.5191
PYR 2.8328
AcCoA 3.7478
OAA 1.7867
AKG 1.0789
Metabolic demands of precursors and cofactors required for
1 g of biomass of E. coli.
Z = 41.2570 vATP - 3.547vNADH +
18.225vNADPH + ….
Summary Flux Balance Analysis
Metabolism in the Hordeum vulgare Seed
FBA model of seed storage metabolism in
developing endosperm of Hordeum vulgare
Metabolism in the Hordeum vulgare Seed
FBA model of seed storage metabolism in
developing endosperm of Hordeum vulgare
Size
257 reactions, 234 metabolites
Pathways
Glyc, TCA, PPP, oxP, Ferm,
Rubisco, AA, Starch, CW, and
others
Case Study
Non-invasive imaging uncovers metabolic compartmentation
in the endosperm
Primary site of alanine synthesis is the central endosperm
Alanine gradients reflect local oxygen state of the
endosperm
13C-Ala gradient can be used as in vivo marker for hypoxia
Source of images:
L. Borisjuk and
H. Rolletschek, IPK
Case Study
Alanine-AT: critical branch point separating aerobic from
anaerobic metabolism
Modelling purpose: to elucidate role of alanine metabolism for
seed tissues with varying oxygen supply
Source of images:
L. Borisjuk and
H. Rolletschek, IPK
Simulation of Region-specific Metabolism
A
B
Central endosperm
(hypoxic region)
Peripheral endosperm
(aerobic region)
Regulation of Alanine-AT
Regulation of Alanine-AT in the endosperm in response to
changing oxygen supply
Current Research Directions
Model coupling
(different organs)
Multiscale modelling
(different modelling
methods)
Software Tools for FBA
CellNetAnalyzer (CNA)
http://www.mpi-magdeburg.mpg.de/projects/cna/cna.html
COBRA Toolbox
http://gcrg.ucsd.edu/downloads/COBRAToolbox
FBA-SimVis
http://fbasimvis.ipk-gatersleben.de/
Model Reconstruction: Metabolic Model
1. Model definition
Organism, organ, dev. stage, pathways, model boundaries
2. Model reconstruction & data retrieval
Top-down: metabolism – pathways – reactions
Integration of heterogeneous data types
Data types: biochemical, physiological, genomic data
Data basis: literature, databases
Missing data
Data referring to closely related species/organs/dev. stages
Inferred reactions: indirect, inferred from BM requirements
Unknown reaction directionality: rev; unknown compartment:
cytosol
Additional Parameters
Maximum uptake/excretion rates
Literature, experimental data, approximations
(e.g. related taxa)
Growth objective
Biomass composition
Energy requirements (growth, maintenance)
Literature, experimental data, approximations
(e.g. dev. stage)
Outline
1. Modelling metabolism
- Basics
- Constraint-based modelling: FBA
- Mathematical representation
- Application of constraints
- Example
- Resources and tools
2. Visualising models and networks
- Basics
- Standard graphical representation
- Process Description Language
- Resources and tools
Question 1 – Can you Read this?
A network with
102 nodes
Protein interaction network,
source: Jeong et al. Nature,
2001
Question 1 – Can you Read this?
A network with
103 nodes
Metabolic network,
source: KEGG, 2012
Question 1 – Can you Read this?
A network with
104 nodes
Protein interaction network,
source: DIP, 2013
Part 1
A network with
104 nodes
Protein interaction network,
source: DIP, 2013
Automatic
layout of large
networks and
circuit-boards
Question 2 – Can you Understand this?
Question 2 – Can you Understand this?
Is degraded?
Stimulates gene
transcription?
Associates into?
Translocates?
Reciprocal stimulation?
Part 2
Standardisation
of graphical
representation
Is degraded?
Stimulates gene
transcription?
Associates into?
Translocates?
Reciprocal stimulation?
Part 1
A network with
104 nodes
Protein interaction network,
source: DIP, 2013
Automatic
layout of large
networks and
circuit-boards
Automatic Layout of Networks
Force-based approaches
Simulate a system of physical forces Eades. Congressus Numerantium, 1984.
Fruchterman & Reingold. Software - Practice and Experience,1991.
Layered approaches
Decycling - layering - crossing
reduction - coordinate assignment Sugiyama et al. IEEE Transactions on Systems, Man and
Cybernetics, 1981.
Orthogonal / grid-based approaches Tamassia. SIAM Journal on Computing, 1987.
Biedl et al. Graph Drawing, LNCS 1353, 1997.
Many Special Layout Algorithms
Commonly extensions of the three classes of layout algorithms
Force-based
Layered
Orthogonal / grid-based
Examples
Source: Genc & Dogrusoz.
Graph Drawing LNCS 2912,
2004.
Source: Schreiber. In
Silico Biology, 2002.
Source: Becker & Rojas.
Bioinformatics, 2001.
Source: Karp & Paley. Conf.
Bioinformatics and Genome
Research, 1994.
Good Network Layout
Better layouts have
Fewer edge crossings
Large crossing angles
Straighter edges
Horizontal and vertical edges
Symmetrical parts shown symmetrically
…
Special layout algorithms
Part 2
Standardisation
of graphical
representation
Ambiguity in Conventional Representation
Standardised Symbols are Important
Most English
speaking country
Quebec Iran China Israel
Singapore Norway Poland USA and
Canada
Pathway Diagrams has been Used a Long Time Ago
From Hodgkin AL and Huxley AF (1952) A quantitative
description of membrane current and its application to
conduction and excitation in nerve. J. Physiol. 117: 500-544.
From the wall chart of Biochemical Path-ways
created by Gerhard Michal (1968)
A metabolic pathway diagram Electrical circuit diagram
representing cell membrane
What is SBGN?
A way to unambiguously describe biochemical and
cellular events in graphs
Limited amount of symbols (~30) Smooth learning curve
Can graphically represent quantitative models,
biochemical pathways, at different levels of granularity
Developed since 2006 by a growing community, part of
COMBINE
Three languages
Process Descriptions one state = one glyph
Entity Relationships one entity = one glyph
Activity Flow conceptual level
Graph Trinity: Three Languages in One
Process Description
maps
Entity Relationships
maps
Activity Flow
maps
Unambiguous
Mechanistic
Sequential
Combinatorial
explosion
Unambiguous
Mechanistic
Non-Sequential
Ambiguous
Conceptual
Sequential
Graph Trinity: Three Languages in One
Process Description
Entity Relationships
Activity Flow
SBGN Process Description Language
A Process Description (PD) Diagram represents all molecular
processes and interactions occurring between various
biochemical entities
It depicts how entities transition forms as a result of biochemical
reactions (including non-covalent modifications such as binding)
Most of the classic metabolic pathways (e.g., glycolysis and
TCA cycle) in biochemistry textbooks were drawn in this
approach
Though not the conventional approach for drawing signaling
pathways, this approach captures the details of biochemical
reactions within the pathway network and provides, in most
cases, unambiguous interpretation of pathway mechanisms
Graph Trinity: Three Languages in One
Process Description
maps
Entity Relationships
maps
Activity Flow
maps
Unambiguous
Mechanistic
Sequential
Combinatorial
explosion
Unambiguous
Mechanistic
Non-Sequential
Ambiguous
Conceptual
Sequential
SBGN Process Description L1 V1.2 Reference Card
Pools of Entities
Collection of molecules
indistinguishable in some
sense
Non-overlapping
Characterized by
concentration
Entity Types
LABEL LABEL LABEL LABEL
Unspecified
entity Simple
chemical Macromolecule Nucleic acid
feature
Material Type
Unit of information
Controlled vocabulary (SBO)
Indicates its chemical structure
(physical composition)
Name Label
Non-macromolecular ion mt:ion
Non-macromolecular radical mt:rad
Ribonucleic acid mt:rna
Deoxribonucleic acid mt:dna
Protein mt:prot
Polysaccharide mt:psac
pre:label PhyA
mt:prot
Conceptual Type
Unit of information
Controlled vocabulary (SBO)
Indicates its function within the
context of a given PD map
pre:label
Name Label
Gene ct:gene
Transcription start site ct:tss
Gene coding region ct:coding
Gene regulatory region ct:grr
Messenger RNA ct:mRNA
crp
ct:grr
Macromolecular Pools: State Variables
Pool is set of molecules
somehow undistinguishable
Molecules can be in different
state
(Non)phosphorylated
Open/close channel
Modified at some state
R R P R
Ch
Close
Ch
Open
Kinase
P@237
2P
Stateless and State-full Entity Types
Not all entities can have states
Stateless
Simple chemicals
Unspecified entity
State-full entities
Macromolecule
Nucleic acid feature
Complex
State is defined as combination of state values
Once defined state variable should be always visible
PhyA
mt:prot
Pr/Prf
Example 1: LEC1/AFL-B3 Network
Macromolecules: biochemical substances that are built up from the
covalent linking of pseudo-identical units. Examples of
macromolecules include proteins, nucleic acids (RNA, DNA), and
polysaccharides (glycogen, cellulose, starch, etc.).
Complex and Multimer
Represents complexes of molecules held together by non-covalent
bonds
Multimer require cardinality
Can have state variables
In multimer it means that all monomers have same state
Use complex if not the same states
N:3
LABEL
N:5
LABEL LABEL
N:2 LABEL
LABEL
Multimers Complex
Key Concept: Process
Process: conversion of
element of one pool to
another
Special cases
Non-covalent binding
Association
Dissociation
Incompleteness
Uncertain process
Omitted process ?
//
Association
Dissociation
Process
Uncertain
process
Omitted
process
LEC1/AFL-B3 Network
Omitted processes are processes that are known to exist, but
details are omitted from the map for the sake of clarity or parsimony.
A single omitted process can represent any number of actual
processes.
Arcs
Using pools by process
Consumption/production
Stoichiometry (optional)
Regulating process rate
Stimulation
Inhibition
Catalysis
Requirement for process
Necessary stimulation
production
catalysis
stimulation
inhibition
necessary
stimulation
modulation
consumption 2
Laying out Process Arcs
Production can represents consumption
Reversible process
Substrates and products should come to opposite sides of process shape (two connectors)
Regulatory arcs should come to other two sides of the process
If you have separate regulation of forward and backward process, you have to split
LEC1/AFL-B3 Network
A stimulation affects positively the flux of a process represented by
the target process.
Sink/source: Creation and Destruction
Represents creation and destruction of entities
Shape to represent source of materials and sink of degraded
entities
LEC1/AFL-B3 network
A submap is used to encapsulate processes (including all types of
nodes and edges) within one glyph. The submap hides its content
to the users, and display only input terminals (or ports).
LEC1/AFL-B3 Factors and Maturation Gene Control
Environmental Influence
External influences: Perturbing agent
Light
Temperature change
Mutation/disease
System manifestation: Phenotype
Apoptosis
Phenotype
LABEL LABEL
Perturbing
agent Phenotype
LEC1/AFL-B3 Factors and Maturation Gene Control
The phenotype glyph represents biological processes or phenotypes
that are affected or generated by a biochemical/regulatory network.
Such processes can take place at different levels and are
independent of the biochemical network itself.
Clone Marker
Each entity pool is only once represented on the map
Layout problems
Clone marker as visual indicator of duplication
Stateless nodes carry unnamed marker
State-full nodes carry named marker to simplify recognition
marker
LABEL
LEC1/AFL-B3 Factors and Maturation Gene Control
If an EPN is duplicated on a map, it is necessary to indicate this fact by
using the clone marker auxiliary unit. The purpose of this marker is to
provide the reader with a visual indication that this node has been
cloned, and that at least one other occurrence of the EPN can be
found in the map.
Discrimination Between Knowledge Levels
Transcription
factor +
Target
gene
DNA
complex
transcription
translation
Discrimination Between Knowledge Levels
Transcription factors
and target gene DNA
together stimulate
transcription,
translation
Transcription factor
stimulates the
transcription of
several putative target
genes
Compartments
Container to represent physical or logical structure
Free form
Visually thicker line
The same entity pools in different compartments are different
Compartments are independent
Overlapping do not mean containment
Compartments
Neuro-muscular junction
Logical Gates
Encode of network logic
To simplify layout
If there are many activators for the
process
To include uncertain information
Combination of TF with unknown or
combinatorial binding kinetics
Three main logic operations
AND: all are required
OR: any combination is required
NOT: prevent influence
Strength and Weakness of SBGN-PD
Strength
Easy translation into
mathematical model
Natural mapping to
SBML
A lot of information in DBs
KEGG
Panther
Timeline is easily extractable
Weakness
Full explicit definition of state
Combinatorial complexity
Additional assumption to
include uncertain
information
Laborious creation
SBGN Process Description L1 V1.2 Reference Card
SBGN Process Description - Entity Pool Nodes
SBGN Process Description - Process Nodes
SBGN Process Description - Connecting Arcs
Software Tools for SBGN
SBGN
http://www.sbgn.org
SBGN-ED
http://www.sbgn-
ed.org
Source: Demir et. al. Nature Biotechnology, 2012.
Standards in Systems Biology
High Throughput Modelling and Visualisation
Path2Models: A
pipeline to produce
models that
combine data
from different
sources
140.000 kinetic,
logical and
constraint-based
models
Part of
BioModelsDB
Path2Models team: F. Büchel, T. Czauderna, C. Chaouiya, A. Dräger, M. Glont, H. Hermjakob, M. Hucka,
S. Keating, D.B. Kell, R. Keller , C. Laibe, N. Le Novère, P. Mendes, F. Mittag, M. Rall, N. Rodriguez, J. Saez-
Rodriguez, F. Schreiber, M. Schubert, N. Swainston, M. van Iersel, C. Wrzodek, M. Wybrow, A. Zell