intrinsic disorder and the evolution of overlapping genes
DESCRIPTION
Presentation to the 2009 Biophysical Society Meeting. Protein intrinsic disorder and overlapping genes. Evolution.TRANSCRIPT
Intrinsic Disorder and the Evolution of Overlapping Genes
THE TEAM:Pedro R. Romero1, Corinne Rancurel2, Mahvash Khosravi1,
Keith Dunker1, and David Karlin3
1Indiana University - Purdue University Indianapolis, Indianapolis, IN, USA, 2Architecture et Fonction des Macromolécules Biologiques, Campus de Luminy, Marseille,
France3Tous Chercheurs, Inmed, Parc de Luminy, 13273 Marseille Cedex 09, France
Overlapping Genes and “Overprinting”
• Overlapping genes discovered in first sequenced genome (Phage Φ-X174, Sanger, 1977)
• Many theoretical studies in the ’70s– Constrained evolution demonstrated
– Limited information content/gene (Yockey, 78)
– Predicted difference in evolutionary speed (Smith & Waterman, 78)
• “Overprinting” of older genes proposed as mechanism for de novo gene generation (Keese & Gibbs, 1992)
• Structural effects noticed in isolated examples: some recent ones shown to be related to intrinsic disorder.
Overprinting: Creation of a novel C-terminal extension
Overprinting Examples
Overprinting and Structure Example: Measles virus (Karlin, et al, 2002)
Figure 1. A schematic view of the measles virus. The nucleocapsid protein (N) assembles into a cylindrical capsid that wraps the virus’ RNA. The nucleocapsid protein has a disordered tail where the phosphoprotein (P) binds. The P protein is largely disordered, and it is encoded by a multiple-encoding area of the RNA (see Figure 2).
Figure 2. Schematic view of the RNA region that encodes the P, V, and C proteins in the measles virus genome. The color lines at the bottom of the diagram represent the three RNA reading frames. The phosphoprotein (P) and the N-terminus of the V protein are encoded on frame 1 (blue), the C-terminus of the V protein is encoded on frame 2 (red), and the entire C protein is encoded on frame 3 (green). The colored cylinders at the top represent the encoded protein products, with the narrow cylinders denoting disordered regions and the wide cylinders representing ordered regions. Notice that no region in the RNA encodes two ordered protein domains.
Intrinsically Disordered Proteins
• IDPs and regions are less sensitive to evolutionary changes than ordered ones
• They undergo faster evolution than ordered proteins or regions
• More dependent on residue content than position
• Mostly participants in regulatory and signaling functions
Structural analysis of viral overlapping genes
Hypotheses1. Intrinsic disorder might help alleviate
evolutionary constraints in overlapping genes
2. New, overprinted genes will likely tend to be more disordered than ancestral counterpart
Structural analysis of viral overlapping genes
• Need for real, expressed genes• Spliced genes excluded (many
unannotated splicing events)
• Assembled data from 43 viral genomes– Unspliced RNA virus– Unspliced retroid virus (RNA and DNA)– Overlaps > 90 nucleotides (30 residues)
Analysis of viral overlapping genes
Analysis of viral overlapping genesTable 2. Predicted order/disorder statistics on overlapping genes data set.
Confidence Intervals Measures of order content Fraction 68% (1 std. dev.) 95%
Fraction of sequence predicted ordered Entire data set 71% 68-74% 66-76% Encoded by overlapping genes 52% 48-57% 45-60% Fraction of overlapping sequence positions predicted ordered on both protein products
Expected 50% 46-55% 44-58% Observed 28% 23-33% 19-36%
0
0.1
0.2
0.3
0.4
0.5
0.6
Expected Observed
Frac
tionO
-O
Difference in disorder contentbetween entire data set andoverlapping regions is significant(p-value 3x10-57)
Difference between expected andobserved fraction of order-orderoverlaps is also significant(P-value 5x10-24)
Structural and functional organization of ancestral/novel proteins
Structural and functional organization of ancestral/novel proteins
Structural and functional organization of ancestral/novel proteins
• Overprinting (novel) proteins– Most are Orphans (no homologs outside of
genus)– Mostly disordered– Mostly accessory proteins– Proteins created by overprinting different
homologs of the same gene display a wide diversity of functional and of structural features
Disorder and evolutionary constraints
HBV Sendai SIV HTLV phiX174A-B
phiX174D-E
PLRV HPV CLCuVAC1-AC4
CLCuVCP-AV2
0%
20%
40%
60%
80%
100%
% d
isor
der i
n ov
erla
ppin
g re
gion
Virus
More constrainedLess constrained
HBV Sendai SIV HTLV phiX174A-B
phiX174D-E
PLRV HPV CLCuVAC1-AC4
CLCuVCP-AV2
0%
20%
40%
60%
80%
100%
% d
isor
der i
n ov
erla
ppin
g re
gion
Virus
More constrainedLess constrained
Disorder and evolutionary constraints
Disorder and evolutionary constraints
• Experiment shows that less constrained (faster evolving) proteins in a pair tend to be more disordered
• Only exceptions when disorder content very similar between the two overlapping proteins
Conclusions
• Both proposed hypothesis supported:– Disorder appears to alleviate evolutionary
constraints– Novel overprinted genes tend to be
disordered• New directions: Show whether novel
genes tend to be disordered “at birth” or are selected to be disordered
Figure 1. A schematic view of the measles virus. The nucleocapsid protein (N) assembles into a cylindrical capsid that wraps the virus’ RNA. The nucleocapsid protein has a disordered tail where the phosphoprotein (P) binds. The P protein is largely disordered, and it is encoded by a multiple-encoding area of the RNA (see Figure 2).
THANKS!!