françois fages mpri bio-info 2007 formal biology of the cell modeling, computing and reasoning with...

Download François Fages MPRI Bio-info 2007 Formal Biology of the Cell Modeling, Computing and Reasoning with Constraints François Fages, Constraint Programming

If you can't read please download the document

Upload: marcella-brisley

Post on 14-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

  • Slide 1

Franois Fages MPRI Bio-info 2007 Formal Biology of the Cell Modeling, Computing and Reasoning with Constraints Franois Fages, Constraint Programming Group, INRIA Rocquencourt mailto:[email protected] http://contraintes.inria.fr/ Transpose concepts and tools from programming theory to systems biology Formal Methods of Program Verification to Systems Biology, Constraint Logic Programming and Constraint-based Model Checking In course, Learn bits of cell biology through computational models, Develop new formalisms, languages and algorithms coming from biological questions Slide 2 Franois Fages MPRI Bio-info 2007 Systems Biology Multidisciplinary field aiming at getting over the complexity walls to reason about biological processes at the system level. Conferences ICSB, CMSB, journal TCSB, Virtual cell: emulate high-level biological processes in terms of their biochemical basis at the molecular level (in silico experiments) Bioinformatics: end 90s, genomic sequences post-genomic data (RNA expression, protein synthesis, protein-protein interactions, ) Need for a strong effort on: - the formal representation of biological processes, - formal tools for modeling and reasoning about their global behavior. Slide 3 Franois Fages MPRI Bio-info 2007 Language Approach to Cell Systems Biology Qualitative models: from diagrammatic notation to Boolean networks [Thomas 73] Petri Nets [Reddy 93] Milners calculus [Regev-Silverman-Shapiro 99-01, Nagasali et al. 00] Bio-ambients [Regev-Panina-Silverman-Cardelli-Shapiro 03] Pathway logic [Eker-Knapp-Laderoute-Lincoln-Meseguer-Sonmez 02] Transition systems [Chabrier-Chiaverini-Danos-Fages-Schachter 04] Biochemical abstract machine BIOCHAM-1 [Chabrier-Fages 03] Quantitative models: from differential equation systems to Hybrid Petri nets [Hofestadt-Thelen 98, Matsuno et al. 00] Hybrid automata [Alur et al. 01, Ghosh-Tomlin 01] Hybrid concurrent constraint languages [Bockmayr-Courtois 01] Rules with continuous dynamics BIOCHAM-2 [Chabrier-Fages-Soliman 04] Slide 4 Franois Fages MPRI Bio-info 2007 The Biochemical Abstract Machine BIOCHAM Software environment based on two formal languages: 1.Biocham Rule Language for Modeling Biochemical Systems 1.Syntax of molecules, compartments and reactions 2.Semantics at 3 abstraction levels: Boolean, Concentrations, Populations 2.Biocham Temporal Logic for Formalizing Biological Properties 1.CTL for Boolean semantics 2.Constraint LTL for concentration semantics, PCTL for stochastic semantics Machine learning Rules and Parameters from Temporal Properties 1.Learning reaction rules from CTL specification 2.Learning kinetic parameter values from Constraint-LTL specification Internship topics: http://contraintes.inria.fr Slide 5 Franois Fages MPRI Bio-info 2007 Overview of the Lectures 1.Formal molecules and reaction rules in BIOCHAM. 2.Formal biological properties in temporal logic. Symbolic model-checking. 3.Continuous dynamics. Kinetics and transport models. 4.Computational models of the cell cycle control. 5.Abstract interpretation and typing of biochemical networks 6.Machine learning reaction rules from temporal properties. 7.Constraint-based model checking. Learning kinetic parameter values. 8.Constraint Logic Programming approach to protein structure prediction. Slide 6 Franois Fages MPRI Bio-info 2007 References A wonderful textbook: Molecular Cell Biology. 5th Edition, 1100 pages+CD, Freeman Publ. Lodish, Berk, Zipursky, Matsudaira, Baltimore, Darnell. Nov. 2003. Modeling dynamic phenomena in molecular and cellular biology. Segel. Cambridge Univ. Press. 1987. Modeling and querying bio-molecular interaction networks. Chabrier, Chiaverini, Danos, Fages, Schchter. Theoretical Computer Science 04 Machine learning biochemical reaction networks. Calzone, Chabrier, Fages, Soliman. Trans. Comp. Syst. Biology. 2006 The Biochemical Abstract Machine BIOCHAM. Fages, Soliman http://contraintes.inria.fr/BIOCHAM Slide 7 Franois Fages MPRI Bio-info 2007 Map of Course 1 1. BIOCHAM syntax Proteins: complexation and phosphorylation DNA and genes: replication and transcription Reaction and transport rules 2. Boolean semantics: concurrent transition system, Kripke structure States and transitions Examples: RTK membrane receptors, MAPK signaling pathways Slide 8 Franois Fages MPRI Bio-info 2007 2. Syntax: a Simple Algebra of Cell Molecules Small molecules: covalent bonds 50-200 kcal/mol 70% water 1% ions 6% amino acids (20), nucleotides (5), fats, sugars, ATP, ADP, Macromolecules: hydrogen bonds, ionic, hydrophobic, Waals 1-5 kcal/mol Stability and bindings determined by the number of weak bonds: 3D shape 20% proteins (50-10 4 amino acids) RNA (10 2 -10 4 nucleotides AGCU) DNA (10 2 -10 6 nucleotides AGCT) Slide 9 Franois Fages MPRI Bio-info 2007 Structure Levels of Proteins 1) Primary structure: word of n amino acids residues (20 n possibilities) linked with C-N bonds Example: MPRI Methionine-Proline-Arginine-Isoleucine 2) Secondary: word of m helix, strands, random coils, (3 m -10 m ) stabilized by hydrogen bonds H---O 3) Tertiary 3D structure: spatial folding stabilized by hydrophobic interactions Slide 10 Franois Fages MPRI Bio-info 2007 Formal proteins Cyclin dependent kinase 1 Cdk1 (free, inactive) Complex Cdk1-Cyclin B Cdk1CycB (low activity) Phosphorylated form Cdk1~{thr161}-CycB at site threonine 161 (high activity) BIOCHAM syntax Slide 11 Franois Fages MPRI Bio-info 2007 Deoxyribonucleic Acid DNA 1)Primary structure: word over 4 nucleotides Adenine, Guanine, Cytosine, Thymine 2) Secondary structure: double helix of pairs A--T and C---G stabilized by hydrogen bonds Slide 12 Franois Fages MPRI Bio-info 2007 DNA: Genome Size SpeciesGenome sizeChromosomesCoding DNA E. Coli (bacteria)5 Mb1 circular100 % S. Cerevisae (yeast)12 Mb1670 % 3 Gb 15 Gb 140 Gb Slide 13 Franois Fages MPRI Bio-info 2007 DNA: Genome Size SpeciesGenome sizeChromosomesCoding DNA E. Coli (bacteria)5 Mb1 circular100 % S. Cerevisae (yeast)12 Mb1670 % Mouse, Human3 Gb20, 2315 % 15 Gb 140 Gb 3,200,000,000 pairs of nucleotides single nucleotide polymorphism 1 / 2kb Slide 14 Franois Fages MPRI Bio-info 2007 Genome Size SpeciesGenome sizeChromosomesCoding DNA E. Coli (bacteria)4 Mb1100 % S. Cerevisae (yeast)12 Mb1670 % Mouse, Human3 Gb20, 2315 % Onion15 Gb81 % 140 Gb Slide 15 Franois Fages MPRI Bio-info 2007 Genome Size SpeciesGenome sizeChromosomesCoding DNA E. Coli (bacteria)4 Mb1100 % S. Cerevisae (yeast)12 Mb1670 % Mouse, Human3 Gb20, 2315 % Onion15 Gb81 % Lungfish140 Gb0.7 % Slide 16 Franois Fages MPRI Bio-info 2007 DNA Replication Separation of the two helices and production of one complementary strand for each copy (from one or several starting points of replication) Slide 17 Franois Fages MPRI Bio-info 2007 Syntax of Genes Part of DNA, unique #E2 Activation #E2-E2f13-DP12 binding of promotion factor Repression binding of another molecule Slide 18 Franois Fages MPRI Bio-info 2007 Transcription: DNA gene pRNA mRNA Protein Genes: parts of DNA 1.Activation (Inhibition): transcription factors (inhibitors) bind to the regulatory region of the gene #E2 + E2F13-DP12 => #E2-E2F13-DP12 2.Transcription: RNA polymerase copies the DNA from start to stop positions into a single stranded pre-mature messenger pRNA _=[#E2-E2F13-DP12]=> pRNAcycA 3.(Alternative) splicing: non coding regions of pRNA are removed giving mature messenger mRNA pRNAcycA => mRNAcycA 4.Protein synthesis: mRNA moves to cytoplasm and binds to ribosome to assemble a protein mRNAcycA => mRNAcycA::cyt mRNAcycA::cyt + ribosome::cyt => cycA::cyt Slide 19 Franois Fages MPRI Bio-info 2007 BIOCHAM Syntax of Objects E == compound | E-E | E~{p1,,pn} Compound : molecule, #gene binding site, abstract @process - : binding operator for protein complexes, gene binding sites, Associative and commutative. ~{} : modification operator for phosphorylated sites, Set of modified sites (Associative, Commutative, Idempotent). O == E | E::location Location : symbolic compartment (nucleus, cytoplasm, membrane, ) S == _ | O+S + : solution operator (Associative, Commutative, Neutral _) Slide 20 Franois Fages MPRI Bio-info 2007 Elementary Rule Schemas Complexation: A + B => A-B Decomplexation A-B => A + B cdk1+cycB => cdk1cycB Slide 21 Franois Fages MPRI Bio-info 2007 Elementary Rule Schemas Complexation: A + B => A-B Decomplexation A-B => A + B cdk1+cycB => cdk1cycB Phosphorylation: A =[C]=> A~{p} Dephosphorylation A~{p} =[C]=> A Cdk1-CycB =[Myt1]=> Cdk1~{thr161}-CycB Cdk1~{thr14,tyr15}-CycB =[Cdc25~{Nterm}]=> Cdk1-CycB Slide 22 Franois Fages MPRI Bio-info 2007 Elementary Rule Schemas Complexation: A + B => A-B Decomplexation A-B => A + B cdk1+cycB => cdk1cycB Phosphorylation: A =[C]=> A~{p} Dephosphorylation A~{p} =[C]=> A Cdk1-CycB =[Myt1]=> Cdk1~{thr161}-CycB Cdk1~{thr14,tyr15}-CycB =[Cdc25~{Nterm}]=> Cdk1-CycB Synthesis: _ =[C]=> A. Degradation: A =[C]=> _. _=[#Ge2-E2f13-Dp12]=>cycA cycE =[@UbiPro]=> _ (not for cycE-cdk2 which is stable) Slide 23 Franois Fages MPRI Bio-info 2007 Elementary Rule Schemas Complexation: A + B => A-B Decomplexation A-B => A + B cdk1+cycB => cdk1cycB Phosphorylation: A =[C]=> A~{p} Dephosphorylation A~{p} =[C]=> A Cdk1-CycB =[Myt1]=> Cdk1~{thr161}-CycB Cdk1~{thr14,tyr15}-CycB =[Cdc25~{Nterm}]=> Cdk1-CycB Synthesis: _ =[C]=> A. Degradation: A =[C]=> _. _=[#Ge2-E2f13-Dp12]=>cycA cycE =[@UbiPro]=> _ (not for cycE-cdk2 which is stable) Transport: A::L1 => A::L2 Cdk1~{p}-CycB::cytoplasm=>Cdk1~{p}-CycB::nucleus Slide 24 Franois Fages MPRI Bio-info 2007 From Syntax to Semantics R ::= S => S | kinetic-expression for R A =[C]=> B stands for A+C => B+C A B stands for A=>B and B=>A, etc. Systems Biology Markup Language: exchange format, no semantics BIOCHAM : three abstraction levels 1.Boolean Semantics: presence-absence of molecules 1.Concurrent Transition System (asynchronous, non-deterministic) 2.Differential Semantics: concentration 1.Ordinary Differential Equations or Hybrid system (deterministic) 3.Stochastic Semantics: number of molecules 1.Continuous time Markov chain Slide 25 Franois Fages MPRI Bio-info 2007 The Actin-Myosin two-stroke Engine with ATP fuel Myosin + ATP => Myosin-ATP Myosin-ATP => Myosin + ADP http://www.sci.sdsu.edu/movies Slide 26 Franois Fages MPRI Bio-info 2007 The Actin-Myosin two-stroke Engine with ATP fuel Myosin + ATP => Myosin-ATP Myosin-ATP => Myosin + ADP http://www.sci.sdsu.edu/movies Slide 27 Franois Fages MPRI Bio-info 2007 The Actin-Myosin two-stroke Engine with ATP fuel Myosin + ATP => Myosin-ATP Myosin-ATP => Myosin + ADP http://www.sci.sdsu.edu/movies Slide 28 Franois Fages MPRI Bio-info 2007 The Actin-Myosin two-stroke Engine with ATP fuel Myosin + ATP => Myosin-ATP Myosin-ATP => Myosin + ADP http://www.sci.sdsu.edu/movies http://www-rocq.inria.fr/sosso/icema2 Slide 29 Franois Fages MPRI Bio-info 2007 Cell to Cell Signaling by Hormones and Receptors Signals: insulin, adrenaline, steroids, EGF, , Delta, , nutriments, light, pressure, Receptors: tyrosine kinases, G-protein coupled, Notch, L + R L-R RAS-GDP =[L-R]=> RAS-GTP Slide 30 Franois Fages MPRI Bio-info 2007 Five MAP Kinase Pathways in Budding Yeast (Saccharomyces Cerevisiae) Slide 31 Franois Fages MPRI Bio-info 2007 MAPK Signaling Pathways Input: RAF Activated by the receptor RAF-p14-3-3 + RAS-GTP => RAF + p14-3-3 + RAS-GDP Output: MAPK~{T183,Y185} moves to the nucleus phosphorylates a transcription factor which stimulates gene transcription Slide 32 Franois Fages MPRI Bio-info 2007 MAPK Signaling Pathway in BIOCHAM RAF + RAFK RAF-RAFK. RAF-RAFK => RAFK + RAF~{p1}. RAF~{p1} + RAFPH RAF~{p1}-RAFPH. RAF~{p1}-RAFPH => RAF + RAFPH. MEK~$P + RAF~{p1} MEK~$P-RAF~{p1} where p2 not in $P. MEK~{p1}-RAF~{p1} => MEK~{p1,p2} + RAF~{p1}. MEK-RAF~{p1} => MEK~{p1} + RAF~{p1}. MEKPH + MEK~{p1}~$P MEK~{p1}~$P-MEKPH. MEK~{p1}-MEKPH => MEK + MEKPH. MEK~{p1,p2}-MEKPH => MEK~{p1} + MEKPH. MAPK~$P + MEK~{p1,p2} MAPK~$P-MEK~{p1,p2} where p2 not in $P. MAPKPH + MAPK~{p1}~$P MAPK~{p1}~$P-MAPKPH. MAPK~{p1}-MAPKPH => MAPK + MAPKPH. MAPK~{p1,p2}-MAPKPH => MAPK~{p1} + MAPKPH. MAPK-MEK~{p1,p2} => MAPK~{p1} + MEK~{p1,p2}. MAPK~{p1}-MEK~{p1,p2} => MAPK~{p1,p2}+MEK~{p1,p2}. Pattern variables $P for Phosphorylation sites Molecules with constraints BIOCHAM rules are expanded in BIOCHAM-0 rules without patterns Slide 33 Franois Fages MPRI Bio-info 2007 Reaction Model of the MAPK Cascade [Levchenko et al. PNAS 2000] (MA(1), MA(0.4)) for RAF + RAFK RAF-RAFK. (MA(0.5),MA(0.5)) for RAF~{p1} + RAFPH RAF~{p1}-RAFPH. (MA(3.3),MA(0.42)) for MEK~$P + RAF~{p1} MEK~$P-RAF~{p1} where p2 not in $P. (MA(10),MA(0.8)) for MEKPH + MEK~{p1}~$P MEK~{p1}~$P-MEKPH. (MA(20),MA(0.7)) for MAPK~$P + MEK~{p1,p2} MAPK~$P-MEK~{p1,p2} where p2 not in $P. (MA(5),MA(0.4)) for MAPKPH + MAPK~{p1}~$P MAPK~{p1}~$P-MAPKPH. MA(0.1) for RAF-RAFK => RAFK + RAF~{p1}. MA(0.1) for RAF~{p1}-RAFPH => RAF + RAFPH. MA(0.1) for MEK~{p1}-RAF~{p1} => MEK~{p1,p2} + RAF~{p1}. MA(0.1) for MEK-RAF~{p1} => MEK~{p1} + RAF~{p1}. MA(0.1) for MEK~{p1}-MEKPH => MEK + MEKPH. MA(0.1) for MEK~{p1,p2}-MEKPH => MEK~{p1} + MEKPH. MA(0.1) for MAPK-MEK~{p1,p2} => MAPK~{p1} + MEK~{p1,p2}. MA(0.1) for MAPK~{p1}-MEK~{p1,p2} => MAPK~{p1,p2} + MEK~{p1,p2}. MA(0.1) for MAPK~{p1}-MAPKPH => MAPK + MAPKPH. MA(0.1) for MAPK~{p1,p2}-MAPKPH => MAPK~{p1} + MAPKPH. Slide 34 Franois Fages MPRI Bio-info 2007 Bipartite Proteins-Reactions Graph of MAPK GraphViz http://www.research.att.co/sw/tools/graphviz Slide 35 Franois Fages MPRI Bio-info 2007 Influence Graph inferred from the syntactical reaction model of the MAPK cascade Negative feedback loops [Fages Soliman CMSB 06] Slide 36 Franois Fages MPRI Bio-info 2007 Differential Simulation Slide 37 Franois Fages MPRI Bio-info 2007 Boolean Simulation Slide 38 Franois Fages MPRI Bio-info 2007 Automatic Generation of CTL Properties reachable(MAPK~{p1})) reachable(!(MAPK~{p1}))) oscil(MAPK~{p1})) reachable(MAPKPH-MAPK~{p1})) reachable(!(MAPKPH-MAPK~{p1}))) oscil(MAPKPH-MAPK~{p1})) AG(!(MAPKPH-MAPK~{p1})->checkpoint(MAPKPH,MAPKPH-MAPK~{p1}))) AG(!(MAPKPH-MAPK~{p1})->checkpoint(MAPK~{p1},MAPKPH-MAPK~{p1}))) reachable(MAPK~{p1,p2})) reachable(!(MAPK~{p1,p2}))) oscil(MAPK~{p1,p2})) Slide 39 Franois Fages MPRI Bio-info 2007 Boolean Semantics Associate: Boolean state variables to molecules denoting the presence/absence of molecules in the cell or compartment A Finite concurrent transition system [Shankar 93] to rules (asynchronous) over-approximating the set of all possible behaviors A reaction A+B=>C+D is translated into 4 transition rules for the possibly complete consumption of reactants: A+B A+B+C+D A+B A+B +C+D A+B A+ B+C+D A+B A+ B+C+D Slide 40 Franois Fages MPRI Bio-info 2007 Kripke Structure K=(S,R) Given: V is a set of state variables, with domain D, T a set of transition rules between states. Associate: a Kripke structure (S,R) where S=D V is the set of possible states with variables ranging in domain D R SxS is the total relation induced by T, that is (A,B) is in R if there exists a transition rule from state A to B (A,A) is in R if there exist no transition from state A.