1001 Stories of Protein Folding
Ming LiSchool of Computer Science
University of Waterloo
CS882, Fall 2006
By the time I finish telling these protein stories, I hope we know better how to fold them by computers.
Prelude: Why should you care?
Through 3 billion years of evolution, nature has created an enormous number of protein structures for different biological functions. Understanding these structures is key to proteomics. Fast computation of protein structures is one of the most important unsolved problems in science today. Much more important than, for example, the P≠NP conjecture.
We now have a real chance to solve it.
This course: I do ½ of the course, so that we understand everything
about proteins. You do ½ of the course, to present all methods for protein
folding. 50% marks. You do a final project designing your method for folding
proteins. 50% marks.
Proteins – the life story
Proteins are building blocks of life. In a cell, 70% is water and 15%-20% are proteins.
Examples: hormones – regulate metabolism structures – hair, wool, muscle,… antibodies – immune response enzymes – chemical reactions
Sickle-cell anemia: hemoglobin protein is made of 4 chains, 2 alphas and 2 betas. Single mutation from Glu to Val happens at residue 6 of the beta chain. This is recessive. Homozygotes die but Heterozygotes have resistance to malaria, hence it had some evolutionary advantage in Africa.
A T
T A
C
C
C
C
G
G
G
G
G
T
T
T
A
A
A
A
T
C
A T
mRNA Proteintranscription translation
Human: 3 billion bases, 30k genes.E. coli: 5 million bases, 4k genes
(A,C,G,U) (20 amino acids)
Codon: three nucleotides encode an amino acid.64 codons20 amino acids, some w/more codes
cDNAreverse transcription
They are built from 20 amino acids and fold in space into functional shapes
Several polypeptide chains can form more complex structures:
What happened in sickle-cell anemia
Mutating toValine.Hydrophobicpatch on thesurface.
Mutating toValine.Hydrophobicpatch on thesurface.
Hemoglobin
Amino acids stories
There are 500 amino acids in nature. Only 20 (22) are used in proteins.
The first amino acid was discovered from asparagus, hence called Asparagine, in 1806. All 20 amino acids in proteins are discovered by 1935.
Traces of glycin, alanine etc were found in a meteorite in Australia in 1969. That brings the conjecture that life began from extraterrestrial origin.
20 Amino acids – the boring part
Polar amino acids Serine Threonine Tyrosine Histidine Cysteine Asparagine Glutamine Tryptophan
Hydrophobic amino acids Alanine Valine Phenylalanine Proline Methionine Isoleucine Lucine
Charged Amino Acids Aspartic acid Glutamic acid Lysine Arginine
Simplest Amino Acid Glycine
Polar: one positive
and one negative charged ends,
e.g. H2O is polar, oil is non-polar.
NeutralNon-polar
Why do protein fold? Some philosophy
The folded structure of a protein is actually thermodynamically less favorable because it reduces the disorder or entropy of the protein. So, why do proteins fold? One of the most important factors driving the folding of a protein is the interaction of polar and nonpolar side chains with the environment. Nonpolar (water hating) side chains tend to push themselves to the inside of a protein while polar (water loving) side chains tend to place themselves to the outside of the molecule. In addition, other noncovalent interactions including electrostatic and van der Waals will enable the protein once folded to be slightly more stable than not.
When oil, a nonpolar, hydrophobic molecule, is placed into water, they push each other away.
Since proteins have nonpolar side chains their reaction in a watery environment is similar to that of oil in water. The nonpolar side chains are pushed to the interior of the protein allowing them to avoid water molecule and giving the protein a globular shape. There is, however, a substantial difference in how the polar side chains react to the water. The polar side chains place themselves to the outside of the protein molecule which allows for their interact with water molecules by forming hydrogen bonds. The folding of the protein increases entropy by placing the nonpolar molecules to the inside, which in turn, compensates for the decrease in entropy as hydrogen bonds form with the polar side chains and water molecules.
1 letter label & how to remember them
If only one amino acid begins with a letter, that letter is used:
C = Cys = Cysteine H = His = Histidine I = Ile = Isoleucine M = Met = Methionine S = Ser = Serine V = Val = Valine
Otherwise the letter is assigned to the more frequent one:
A = Ala = Alanine G = Gly = Glycine L = Leu = Leucine P = Pro = Proline T = Thr = Threonine
The losers try phonetically F = Phe = Phenylalanine R = Arg = Arginine Y = Tyr = Tyrosine W = Trp = Trptophan
(double ring)
When everything fails: D = Asp = Aspartic acid N = Asn = Asparagine E = Glu = Glutamic acid Q = Gln = Glutamine K = Lys = Lysine
They really look all the same:
One amino acid.The difference is only in the side chain R.
Many amino acidsconnected to a polypeptide chain
Lose H2O
The amino acids are connected to form polypeptide chains: going from N terminal to C terminal
Planar, rigid, withknown bond distancesand angles.
Lose water H2Owhen forming the peptide bond
They could have been different
L-form vs D-form: Looking down the H-Cα bond from H, the L-form is CORN. The D-form is NRCO
All amino acids occur in proteins have L-form.
It is unclear why D-form was not chosen
In nature, L, D-forms occur with equal chance.
In functioning proteins, onlyL-form occur
Mirror image
Story of cysteines
Two cysteine residues in different (non-adjacent) parts of a protein sequence can be oxidized to form a disulfide bridge, as end product of air oxidation:
2 cysteines + ½ O2 = 2 linked cysteines + H2O
They have the functions: Stablize single protein fold Linking two chains (linking A and B chains in
insulin)
Disulfide bond between two cystines:
Cystine:
SH | CH2
|
Note: We will not studyamino acids one by one,but we will studytheir structures when we meet them. Red bondconnects to Cα
The Φ and Ψ angles
The angle at N-Cα is Φ angle
The angle at Cα-C’ is Ψ angle
No side chain is involved (which is at Cα)
These angles determine backbone structure.
Cα
The Ramachandran plot
L-amino acids cannot formLarge left-handed helix, butGly (also apn, asp) can formshort left-handed helix, withside chain forming hydrogen bound with main chain.
Red: goodYellow: okWhite: forbidden
Except Glycine
The story of Glycine
Glycines have no side-chain (just H), so it can adopt phi and psi angles in all 4 quadrants of the Ramachadran plot.
Thus, it frequently occur in turn regions of proteins where any other residue would be sterically hindered.
Glycine:
H |
Staggered carbon atoms for side chains
Ethan: CH3CH3
Aligned,too crowded
Most favorable+ 1200 rotations
Valine: (b) is more favorable,least crowded
Cβ
Cα