v. evolution of protein structure and function protein structure classification structural...

38
V. Evolution of Protein Structure and Function Protein structure classification Structural relationships among homologous proteins Changes in proteins during evolution uncovers functionally/structurally important amino acid sites Domain swapping Classification of protein folding patterns How do proteins evolve new functions? Classification of protein functions

Post on 22-Dec-2015

229 views

Category:

Documents


0 download

TRANSCRIPT

V. Evolution of Protein Structure and Function

•Protein structure classification

•Structural relationships among homologous proteins

•Changes in proteins during evolution uncovers functionally/structurally important amino acid sites

•Domain swapping

•Classification of protein folding patterns

•How do proteins evolve new functions?

•Classification of protein functions

Super secondary Structures (I)

• Hairpins connect two antiparallel strands;

• Cross-overs connect two parallel beta strands, most common through an -helix (-- topology). All cross-overs are right-handed. That is, when placing C-side strand closer and pointing right, the connecting a-helix or loop is on the top of the sheet;

2

1

2

1

1

2

Right-handed Cross-over Left-handed Cross-over

Super Secondary Structures (II)

• Coiled-coil is a common alpha helix structure found in proteins that participate in protein folding and protein-protein interactions.

– (a-b-c-d-e-f-g)n, where a and d are

nonpolar that leads to a hydrophobic side

• Helix bundles refers to three or more helices packing together;

– Knobs into holes packing:In both kinds of helix packings, slight distortion of the individual helices and the inclination of their axes with respectto each other allows the side chains of the nonpolar residues to mesh together

-barrels

It is like a sheet wrapped around a cylindreIt is like a sheet wrapped around a cylindre

The Hierarchical nature of protein architecture

• Primary structure– Proteins are first synthesized as linear sequences of

amino acids

• Secondary structure– The linear sequence can undergo simple packing in

regions of local regularity• i.e., -helices, -strands, -sheets & -turns

• Super-secondary structure– the packing of secondary structure elements into stable

units• e.g., -barrels, units, Greek keys, etc.

• Tertiary structure– The complex folding of packed secondary structures

give the tertiary structure of the protein

Most of the secondary structured proteins are folded to protect hydrophobic regions (Tertiary structures)

• Quaternary structure– the arrangement of separate chains within a protein that

has more than one subunit • e.g., haemoglobin

Some proteins work as multi-complex machines and have to undergo a quaternary level of folding.

• Quinternary structure– the arrangement of separate molecules, such as in

protein-protein or protein-nucleic acid interactions

The highest level of organisation is the Quinternary structure

Protein domains: compact units within the folding pattern of single chains, that look as if they should have independent stability

Modular proteins are multidomain proteins which often contain many copies of closely related domains.

A Domain is a compact, semi-independent region of 100-150 amino acids that has a hydrophobic core

and hydrophilic exterior. Domains can be structural and/or functional

Bundle structural domain

-barrel structural domain

Glyceraldehyde-3-phosphate dehydrogenase has two functional domains

Glyceraldehyde-3-phosphate binding domain

NAD+

binding domain

Quaternary Structure Quaternary Structure Spatial arrangement of protein subunits and the

nature of their contacts.

Hemoglobin Tetramer

Immunoglobulin Quanternary Structure

Evolutionary changes in protein sequences

Events responsible for the generation of diversity:

- mutations

- insertions

- deletions

- transposition of large dna pieces

Selection reacts to protein function as determined with protein structure

A mutant gene may determine a protein with:

- equivalent function (neutral mutation)

- new and optimised function (adaptive evolution)

- new and sub-optimised function (purifying selection)

Evolution and proteins

• You can see the effects of evolution, not only in the whole organism, but also in its molecules - DNA and protein

• For a mutation to have an effect on the phenotype (and be subject to selection) it must (usually) affect the structure or function of a protein

• You can learn a lot about evolution by studying the structure of proteins

Evolution in a population may occur through positive or negative selection or through the neutral fixation of protein-function variants

Proteins from different species have similar but not identical sequences. This fact implies that they have similar but not identical protein structures

Gilbert maintained that exons represent structural components of proteins that can be recombined in different contexts, as a mechanism of generation of new protein folds.

This suggestion could not been supported below the protein domain level

Table of alignment of amino acid sequences is a very useful tool for evolutionary studies and provide more information than structure does

The pattern of variation at the amino acid level give clues of the selective constraints operating in the sequence or even in the protein structure

It is possible to construct phylogenetic trees derived from tabulations of related sequences.

Phylogenies derived from different families of proteins from the same range of species are mutually consistent with the branching order

To infer phylogenetic relationships between species through genes it is important to choose functionally equivalent proteins

One of the hypotheses that have gained much attention is the molecular clock hypothesis, which suggests that amino acid substitutions proceed at a constant rate within protein families

A molecular clock

• Plot the number of changes in amino-acids between the same protein in different species (such as cytochrome C) against the time since the species diverged

• Gives a straight line - so evolution of a protein sequence proceeds at a constant rate and therefore can be used as a clock

Calibration of the clock for specific protein families would ensure the dating of biological events not present in the fossil record and would imply that changes are non-adaptive due to their independence of the selective constraints

Variability of selective constraints in protein molecules

Amino acid substitution rates do vary between:

- Different protein families

- proteins within the same family

- amino acid regions in the same protein

The main reason for the variability in the substitution rates among amino acid regions is that different amino acids are under different functional and structural constraints

Those amino acids playing less important functional or structural role can fix greater number of mutations due to their neutral effect on the biological fitness of the protein

Evolution of protein structure

In families of closely related proteins, mutations alter the specificity of proteins rather than changing their structure

- Family of serine proteinases

-specificity of haemoglobin by other ligands

In very few cases punctual mutations alter the protein in such a way that novel functions arise, being the chymotryosin family of serine proteinases a clear example of the emergence of novel functions:

- Haptoglobine = chymotripsine – proteolytic activity. Acts as a chaperone, preventing protein aggregation

- Serine proteinases of rhinoviruses forms the initiation complex of RNA synthesis

Amino acid changes

Neutrality

Purifying selection

Positive selection

Neutral Theory of molecular evolution

Amino acid replacements

Non-synonymous nucleotide substitution

Protein function or structure

changes

Neutral evolution vs selection

Biological fitness (W)

Selection: Positive & Negative

A

A

A

A

A

A

One sequence scenario Population scenario

AAACC

AAACC

AAACC

ThrSer

ACGTCA

ThrProPro

ACGCCAThrSer

ACGCCG

ArgSer

AGGCCG ThrSer

ACTCTG

AlaSer

GCTCTG

AlaSer

GCACTG

-

-

One sequence scenario again

Certain events have functional consequences and will be selected out. The strength and localization of this selection is of great interest.

The selection criteria could in principle be anything, but the selection against amino acid changes is without comparison the most important.

Domain combination and recombination

One mechanism to ensure the generation of different partners is gene duplication followed by divergence

In some cases catalytic domains can be formed by the contribution of both duplicates (paralogues)

Serine proteinases domains

In some others, gene duplication provides an additional regulatory function, by development of an oligomeric protein

mutations on the tetrameric structure of haemoglobin can turn the allosteric structure efficiency in

transporting oxygen

Proteins can combine gene duplication or fusion with generation of partners by domain swapping

IL-5

A

B

A

BA

B

Two-domain monomer

Domain-swapped dimer

Families of related proteins tend to retain similar folding patterns

The general folding pattern of a protein use to be preserved even with amino acid substitutions. The amount of structural distortions, however, increases locally with the increase in the amino acid sequence divergence between two proteins

These distortions are not uniformly distributed in the structure but seems that the core preserve the folding pattern in a family, with other parts of the structure suffering dramatic distortions

In the overwhelming majority of proteins, the core is formed by the main elements of secondary structures and peptides flanking them, including active sites peptides

The fraction of identical residues in the core measures the amount of sequence divergence between two proteins

proteins related in more than 60% of amino acids, the core contains more than 90% of the residues, the refolding of the remaining 10% will involve minor surface loops

Pairwise Sequence Identities and Structure Similarity Pairwise Sequence Identities and Structure Similarity (SSAP) Scores in Domain Families(SSAP) Scores in Domain Families

structure similarity

(SSAP)score

sequence identity (%)

same function

different function

Biotin Carboxylase D-Ala D-Ala Ligase

ATP Grasp Superfamily

In distant homologues the structure can be embellished - but 50-60% of the structure in the core is highly

conserved

Conservation of Protein StructureConservation of Protein Structure

the cores of protein structures are very well conserved during evolution the cores of protein structures are very well conserved during evolution even when their sequences have changed considerably even when their sequences have changed considerably

comparing protein structures allows us to identify more distant comparing protein structures allows us to identify more distant evolutionary relationshipevolutionary relationship

Structural Genomics initiatives will give structures for most of the major Structural Genomics initiatives will give structures for most of the major protein familiesprotein families

Related structuresRMSD usually < 3.5A

Evolution of New Protein FunctionsEvolution of New Protein Functions

gene duplication

incremental mutations

gene fusion

oligomerisation

Protein structures can accommodate many but not all single-site mutations

Some of this single mutations are very important from the medical point of view:

SNPs can produce incorrect chain termination (some Thalassaemias)

As qualitative rules, we should know that single mutations on the surface of proteins use to be innocuous. Mutations in important buried regions of the molecule will be lethal and removed by selection (we will never see it)

Natural protein variants are only a subset of all possible variants that have been subjected to natural selection

Artificial variants can extend our knowledge beyond the imaginable and show as the possible subsets of optimising proteins

The allumwandlung technique consists on the substitution of a single amino acid by the other 19, testing of functional properties, and their crystal-structure solution

In case we could predict the effect of single mutations on the protein structure and function, that would be a first step to design more optimum proteins with a clear relatedness to public health