tsvi tlusty, physical biology gidi lasovski
DESCRIPTION
A simple model for the evolution of molecular codes driven by the interplay of accuracy, diversity and cost. Tsvi Tlusty, Physical Biology Gidi Lasovski. The main idea. Understanding molecular codes Their evolution and the forces that affect them. What is a molecular code The genetic code - PowerPoint PPT PresentationTRANSCRIPT
A simple model for the evolution of
molecular codes driven by the interplay
of accuracy, diversity and costTsvi Tlusty, Physical Biology
Gidi Lasovski
The main idea
Understanding molecular codes Their evolution and the forces that affect
them
What is a molecular code The genetic code The fitness of molecular codes The evolution and emergence of molecular
codes Suggested experimental verification
The Central Dogma of Molecular Biology1. A signaling protein binds to a gene
2. The RNA polymerase generates mRNA from the gene
3. The mRNA exits the nucleus of the cell
4. A Ribosome reads the mRNA and creates a protein, with the help of tRNAs
The tRNAs provide the Ribosome with amino acids, the building blocks of the protein
What is a molecular code? The Genetic Code is a molecular code:
The symbols are A, U, C & G The Machine:
RNA Polymerase Signaling molecules (proteins) mRNA Ribosome
The output: Proteins
The cost of operation of the machine is the ATP and the tRNAs.
The symbols encode Amino Acids redundantly 64 options – only 20 amino acids for robustness reasons?
The genetic code
Non PolarPolarBasicAcidic
The genetic code - similarity
The fitness of molecular codesThree parameters: Error load Diversity Cost
We define the fitness of the code as the linear combination of these three conflicting needs
Error load
When reading a number, we can misread 3 for 8 (or vice versa) anywhere:3838383838383838383838
here or hereWe want to make sure the errors would be less
likely where they’re more important
3838383838383838383838
Error load
Similar meaning should go with a similar (close) symbol, so that a small reading error would cause only a small understanding error.
If this -> signifies the deviation of sugar, which code would you prefer:
A or B
Diversity
Enables efficient and accurate delivery of different messages.
A small lack of sugar - I’m hungry
A medium lack of sugar - I’m starving
A large lack of sugar – Let’s go to San Martin
NOW!
Diversity
Enables the code to transmit as many different symbols as possible, equivalent to different symbols in a UTM
Many different symbols – less states of the machine
More symbols also enable faster, more accurate control
Cost
Car insurance – the cost of improving the robustness of your driving
Another example is the price of ink and space in my demonstration
Cost
Strong binding takes up more energy to create and read
The energy is proportional to the length of the binding site.
The binding probability scales like e-E/T, E ~ ln(p)
Notice that diversity has its costs as well, more symbols means longer molecules
Summary
The code has to be optimized at an equilibrium of error load, diversity and cost.
Quantifying the code
Using Lagrange multipliers:
H = −Load + WD · Diversity − WC · Cost
C is the reduction of entropy, so WC is equivalent to the temperature (WCC ~ TdS)
wc is equivalent to the temperature
J/wc = 1 is the phase transition: “liquid” (the non coding state) J/wc < 1
“solid” (the coding state) J/wc > 1
Ψ – the order parameter
H – the fitness
C – the cost
D – the diversity
L – the error load
The result is an Ising like model
Possible experiment Take a bacteria with the
transcription factor i. Duplicate the gene that codes i,
let’s call the duplicate j i, j control the response to A(t) If A(t) fluctuates strongly, i, j may
evolve to 2 different meanings - better control
If A(t) fluctuates weakly, maybe one of them would be deleted.
Experiment around the critical point
Using Lagrange multipliers:
H = −L + WD · D − WC · CC is the reduction of entropy, so WC is equivalent to the
temperature (WCC ~ TdS)
Diversity
D = Σi,j,α,β(1 − δij )piαpjβcαβ
Error loadL = Σi,j,α,β rijpiαpjβcαβ
Cost
C = Σiα piα ln(piα/pα)
Eiα ln ∼ piα
pα = ns-1 Σj pjα
rij – the probability to read i as j
Piα – the probability for i to be mapped to α is
Cαβ – the cost of misinterpreting α as β
Additional slides for the mathematical model
J = c (1−2r + wD)
wc is equivalent to the temperature
J/wc = 1 is the phase transition: “liquid” (the non coding state) J/wc < 1 “solid” (the coding state) J/wc > 1
Ψ – the order parameter
H – the fitness
C – the cost
D – the diversity
L – the error load
ψ = tanh (∗ J/wC · ψ )∗
H = c·J·ψ2 − wC[(1 + ψ) ln(1 + ψ)+ (1 − ψ) ln(1 − ψ)]
Quantifying the code
Ns symbols (i, j, k..) mapped to Nm meanings (α, β..)
Piα - The probability for i to be mapped to α
ΣαPiα =1
In the non coding state, the prob. is constant 1/Nm
rij – the probability to read i as j.
Cαβ – the cost of misinterpreting α as β The total error load:
L = Σi,j,α,β rijpiαpjβcαβ
Just like a ferromagnet: r – interaction, c – magnitude p – the spin
Also prefers specific symbols L(rii) = 0 only if i signifies a specific meaning
Toy model (1 bit)
P - ∗ the optimal code, can be found by the derivation ∂HT/∂piα = 0 p∗
iα = z-1 p∗α exp(−Giα/wC) z = Σβ p∗
βexp(−Giβ/wC) Giα = 2Σj,β (rij − wD(1 − δij))pjβcαβ c = 0 c
c 0 r = 1−r r
r 1−r p = 0.5 1 + ψ 1 − ψ
1 − ψ 1 + ψ
ψ∗ = tanh (J/wC · ψ∗) J = c (1−2r + wD) wC∗ = J = (1 − 2r + wD) c
General criteria
Qiαjβ =−(∂2H/∂piα∂pjβ) stops being positive definite
wC∗ = 2*nm-1 (λr
∗ + wD)|λc∗ |
λr∗ is the 2nd-largest eigenvalue of r
λc ∗ is the smallest eigenvalue of c - corresponds to the longest
wavelength – smallest error load