conservation of codon optimality
DESCRIPTION
Group presentation that contains: - outlining the basics of translation - experimental evidence that shows proteins from synonymous mRNA sequences differ - hypothesis for how synonymous codons effect the resulting protein structure - the methodology I use to test for the conservation of codon choice within related proteinsTRANSCRIPT
Conservation of codon optimality within families
Alistair Martin, Charlotte Deane
Renaturation
“The original structure of some proteins can be regenerated upon removal of the denaturing agent and restoration of conditions favouring the native state. Proteins subject to this process, called renaturation, include serum albumin from blood, hemoglobin (the oxygen-carrying pigment of red blood cells), and the enzyme ribonuclease”
- Encyclopedia Britannica
All the information is contained in the protein sequence!Who cares about degeneracy?!
Question - Experimental “Oddities”
Synonymous switches have an effect:
● Can cause exons to be skipped
● Can cause a reduction in activity
● Can cause misfolding
Answer - Cotranslational folding
Prior Work“N-terminal regions are generally translated slower than C-terminal regions”
- Saunders & Deane (2010 )
“the first 5-10 codons of protein-coding genes are often codons that are less frequently used in the rest of the genome”
- Bentele et al. (2013)
“cell cycle-regulated genes expressed in different phases display different codon preferences”
- Morgenstern et al. (2012)
Conservation of codon optimality within families
Alistair Martin, Charlotte Deane
Starting point - CSandS (2010)
Mapping of mRNA seq to protein seq
● 4000+ matches● High quality● Human curated ● Structural Information● Taxa Information
● Bad documentation
Saunders R, Deane CM, Nucleic Acids Res., 2010, 38(19), 6719-28.
Modifying the database
Added● SCOP Families
(SCOP 1.75B)● tRNA gene copy #
(GtRNAdb)● SCOP family structural
alignment(MAMMOTH-Mult)
Removed● Enforce 40% seq id● NMR experiments● Minimum of 7 in
SCOP family● Organisms without
tRNA data● Misaligned families
SCOP families: 43Structural Domains: 454
Database Stats
Scoring a SCOP family (1)Protein Sequencepdb-1 (HUMAN) V F T V E V K N Y Gpdb-2 (ECALL) V Y N V Y V R - N Gpdb-3 (HUMAN) K Y K A E W R A V Gpdb-4 (YEAST) - - - - D V P G D R
mRNA Sequencepdb-1 (HUMAN) ACU GUU GAA GUC AAA AAC UAC GGApdb-2 (ECALL) AAU GUA UAU GUU CGA --- AAC GGApdb-3 (HUMAN) AAG GCC GAG UGG CGU GCU GUG GGCpdb-4 (YEAST) --- --- GAU GUG CCA UGU GAC AGG
Structural alignment produced by MAMMOTH-mult on SCOP family domain fragments
Known mRNA sequence mapped onto alignment
Mapping mRNA
One to one matching of codons to amino acids.
100% coverage by mRNA sequence
Codon > amino acid if any difference
Scoring a SCOP family (2)mRNA Sequencepdb-1 (HUMAN) ACU GUU GAA GUC AAA AAC UAC GGApdb-2 (ECALL) AAU GUA UAU GUU CGA --- AAC GGApdb-3 (HUMAN) AAG GCC GAG UGG CGU GCU GUG GGCpdb-4 (YEAST) --- --- GAU GUG CCA UGU GAC AGG
Translation Scorespdb-1 (HUMAN) 0.3 0.9 0.1 0.6 0.4 0.1 0.8 0.6pdb-2 (ECALL) 0.5 0.8 0.4 0.9 0.5 --- 0.6 0.5pdb-3 (HUMAN) 0.6 0.6 0.1 0.6 0.9 0.2 0.1 0.1pdb-4 (YEAST) --- --- 0.2 0.7 0.4 0.1 0.7 0.5
Organism specific translation speed scores given to each codon. Profile is then smoothed.
Translation Speed Scores
Using the tRNA Adaptation Index (tAI).
This is determined by : - tRNA gene copy number- Simple Crick’s wobble pairing
Other scoring systems exist.
Scoring a SCOP family (3)
Optimality Thresholds
Determined using the organism specific open reading frames within database.
Manually specified thresholds.
Issues with organisms present in low frequency.
Translation Scorespdb-1 (HUMAN) 0.3 0.9 0.1 0.6 0.4 0.1 0.8 0.6pdb-2 (ECALL) 0.5 0.8 0.4 0.9 0.5 --- 0.6 0.5pdb-3 (HUMAN) 0.6 0.6 0.1 0.6 0.9 0.2 0.1 0.1pdb-4 (YEAST) --- --- 0.2 0.7 0.4 0.1 0.7 0.5
Optimality Scorespdb-1 (HUMAN) 0 +1 -1 0 0 -1 +1 0pdb-2 (ECALL) 0 +1 0 +1 0 -- 0 0pdb-3 (HUMAN) 0 0 -1 0 +1 -1 -1 -1pdb-4 (YEAST) -- -- -1 0 0 -1 0 0
Organism specific thresholds determine which codons are optimal (+1) , nonoptimal (-1), or neither (0).
Scoring a SCOP family (4)
Conservation Scores
Simple codon-wise average of optimality scores.
Must have at least 5 codons in an aligned column.
Randomisation of optimality scores produces SCOP family specific specified thresholds (5%).
Optimality Scorespdb-1 (HUMAN) 0 +1 -1 0 0 -1 +1 0pdb-2 (ECALL) 0 +1 0 +1 0 -- 0 0pdb-3 (HUMAN) 0 0 -1 0 +1 -1 -1 -1pdb-4 (YEAST) -- -- -1 0 0 -1 0 0
Conservation Scores
SCOP family specific thresholds determine optimal (red) and nonoptimal (blue) conserved codons.
Scoring a fold family - Summary
StructuralAlignment
Conserved Codons
1. Map mRNA Seq.
2. Attribute translation speed scores to each Codon.
3. Assign optimal, non-optimal or neither to each codon.
4. Determine conservation scores for each column.
Scoring a fold family - Result
Is there any conservation?
How many SCOP families have more conserved residues than expected by chance?
OptimalityAssignmentThresholds
Looking forward
● Remove signal from conserved residues
● Correlation to structural features
● Update the CSandS database
● Investigate the ribosome tunnel
● Subgroup analysis - renaturation, chaperone
Questions?