structure prediction. tertiary protein structure: protein folding three main approaches: [1]...
Post on 19-Dec-2015
232 views
TRANSCRIPT
Tertiary protein structure: protein folding
Three main approaches:
[1] experimental determination (X-ray crystallography, NMR)
[2] Comparative modeling (based on homology)
[3] Ab initio (de novo) prediction (Dr. Ingo Ruczinski at JHSPH)
Experimental approaches to protein structure
[1] X-ray crystallography-- Used to determine 80% of structures-- Requires high protein concentration-- Requires crystals-- Able to trace amino acid side chains-- Earliest structure solved was myoglobin
[2] NMR-- Magnetic field applied to proteins in solution-- Largest structures: 350 amino acids (40 kD)-- Does not require crystallization
Steps in obtaining a protein structure
Target selection
Obtain, characterize protein
Determine, refine, model the structure
Deposit in database
X-ray crystallography
http://en.wikipedia.org/wiki/X-ray_diffraction
Sperm Whale Myoglobin
PDB
• April 08, 2008 – 50,000 proteins, 25 new experimentally determined structures each day
New folds
Old folds
New
PD
B s
truct
ure
s
Ab initio protein prediction
• Starts with an attempt to derive secondary structure from the amino acid sequence– Predicting the likelihood that a subsequence will fold into an alpha-
helix, beta-sheet, or coil, using physicochemical parameters or HMMs and ANNs
– Able to accurately predict 3/4 of all local structures
Secondary structure prediction
Chou and Fasman (1974) developed an algorithmbased on the frequencies of amino acids found ina helices, b-sheets, and turns.
Proline: occurs at turns, but not in a helices.
GOR (Garnier, Osguthorpe, Robson): related algorithm
Modern algorithms: use multiple sequence alignmentsand achieve higher success rate (about 70-75%)
Page 279-280
Training the Network
• Use PDB entries with validated secondary structures
• Measures of accuracy– Q3 Score percentage of protein correctly predicted
(trains to predicting the most abundant structure)– You get 50% if you just predict everything to be a
coil– Most methods get around 60% with this metric
Correlation Coeficient
• How correlated are the predictions for coils, helix and Beta-sheets to the real structures
• This ignores what we really want to get to– If the real structure has 3 coils, do we predict 3
coils?• Segment overlap score (Sov) gives credit to
how protein like the structure is, but it is correlated with Q3
Fold recognition (structural profiles)
• Attempts to find the best fit of a raw polypeptide sequence onto a library of known protein folds
• A prediction of the secondary structure of the unknown is made and compared with the secondary structure of each member of the library of folds
Threading
• Takes the fold recognition process a step further:– Empirical-energy functions for residue pair
interactions are used to mount the unknown onto the putative backbone in the best possible manner
CATH
• CATH: Protein Structure Classification• Class (C), Architecture (A), Topology (T) and
Homologous superfamily (H)