precision and accuracy of nmr structures · f-measure is the overall performance score calculated...

33
Precision and Accuracy of NMR Structures

Upload: others

Post on 08-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Precision and Accuracy of NMR Structures · F-measure is the overall performance score calculated from the recall and precision. It provides measure of the overall fit between the

Precision and Accuracyof NMR Structures

Page 2: Precision and Accuracy of NMR Structures · F-measure is the overall performance score calculated from the recall and precision. It provides measure of the overall fit between the

NMR Structure Determination in the NESG

NMR

Protein Production

Structures!

Structure ValidationSPINS

HarvestDB

Structure Gallery

PDB entry BMRB entry

www.nesg.org

1

Structure Determination& Validation

Page 3: Precision and Accuracy of NMR Structures · F-measure is the overall performance score calculated from the recall and precision. It provides measure of the overall fit between the

Overview

(i) The Problem of Precision and Accuracy in Protein NMR

(ii) Assessing Precision of NMR Structures

(iii) Assessing Accuracy of NMR Structures- PSVS software- RPF software

(iv) Summary

Page 4: Precision and Accuracy of NMR Structures · F-measure is the overall performance score calculated from the recall and precision. It provides measure of the overall fit between the

Shortcomings of Protein NMR Field

No Standard Conventions for Estimating PrecisionPrecision: Uncertainty in atomic positions indicated by the

uncertainty in the data and underlying structural assumptions

How tightly the shots cluster

No Standard Conventions for Estimating AccuracyAccuracy: Similarity of model to the actual structure(s)

present in the NMR tubeHow close you get to the bull’s eye

Page 5: Precision and Accuracy of NMR Structures · F-measure is the overall performance score calculated from the recall and precision. It provides measure of the overall fit between the

Challenges and IssuesEstimating Precision

RMSD - many issues.- which atoms to include in superimposition? BB vs. heavy vs. H’s- should we define “core” as a mix of BB and SC? Include H atoms?- no standard of sampling: i.e., X structures / Y structures calculated- how to represent disordered regions? Single structure with atom-specific uncertainties?

Estimating Accuracy- Relaxation matrix (CORMA); R-fac; RPF; validation with RDC data. - Knowledge-based assessment (ProCheck, WhatIf, MolProbity, etc), cf. crystal structures.

Constraint violations Back calculation of NOEs - Relaxation Matrix- Constraints are interpreted data - Compare to NOESY Peak List?- No standard for calibrating constraints - Exchange broadening, lineshape,

Constraints per residue differential relaxation effects- Conformationally-restraining - Diagonal, ridges, overlap, residual water,- Constraints per restrained residue Cross validation with RDC - How to define restrained residue? - Not measured universally

ProCheck / MAGE Back calculation of Chemical Shift- Derived from crystal structures H-bond Geometry- Bona fide differences biologically relevant?- Which residues to include/exclude?

Page 6: Precision and Accuracy of NMR Structures · F-measure is the overall performance score calculated from the recall and precision. It provides measure of the overall fit between the

How to Assess Precision?

David Snyder, Roberto Tejero

Page 7: Precision and Accuracy of NMR Structures · F-measure is the overall performance score calculated from the recall and precision. It provides measure of the overall fit between the

Defining “Core” for Superimposition Using Dihedral Angle Order Parameter

Hyberts and Wagner1992

Best convention to date

“Ordered Residue”S(φ) + S(ψ) > 1.8

How to distinguish surface loop from interdomain linker?

Stopping rule- 1 core or 2 cores? Or more?

Should we restrict core atom sets to backbone atoms?

Page 8: Precision and Accuracy of NMR Structures · F-measure is the overall performance score calculated from the recall and precision. It provides measure of the overall fit between the

D.A. Snyder and G.T. Montelione,PROTEINS 2005, 59: 673-686“Clustering algorithms for identifying core atom sets and for assessing the precision of protein structure ensembles.”

Inter-atomic Variance Matrix (IVM) matrix of variances in inter-atomic distances. Can be used to partition core atoms into “ordered” vs“disordered”, and to identify “domains”– Nilges, Clore, Gronenborn, 1987– Gerstein & Altman, 1995– Kelly, Sutcliff, et al 1996, 1997– Gelfand, et al 1998

Find Core: can define BB or heavy atom core atom sets

FindCore - Identify “Well Defined’ Core Atom Sets

Page 9: Precision and Accuracy of NMR Structures · F-measure is the overall performance score calculated from the recall and precision. It provides measure of the overall fit between the

How to Assess Precision?

We have “convention” to define “core atom sets”for superimposition, but no convention for

generating the ensemble, and no standard of precision

Page 10: Precision and Accuracy of NMR Structures · F-measure is the overall performance score calculated from the recall and precision. It provides measure of the overall fit between the

How to Assess Accuracy?

Yuanpeng Janet Huang, Aneeban Bhattacharya,

Dehua Hang, Roberto Tejero

Page 11: Precision and Accuracy of NMR Structures · F-measure is the overall performance score calculated from the recall and precision. It provides measure of the overall fit between the

Protein Structure Validation Software (PSVS) Suite

A. Bhattacharya, R. Tejero, G.T. Montelione (2007)

PROTEINS 66:778-795.

Page 12: Precision and Accuracy of NMR Structures · F-measure is the overall performance score calculated from the recall and precision. It provides measure of the overall fit between the

Protein Structure Validation Software (PSVS)Bhattacharya, Tejero, Montelione

PROTEINS. 2007Tool(s) Parameter(s) evaluatedPDBStat and FindCore(Tejero and Montelione;Snyder and Montelione)

Analyze number of conformationally-restricting constraints, violations of constraints, define ordered regions of structure and calculate RMSD of atomic coordinates, identify conformationally restricting restraints

RPF(Huang, et al, JACS 2005 127:1665)

Goodness-of-fit of NMR structure with NOESY data

DSSP(Kabsch & Sander, Biopolymers 1983 22: 2577)

Calculate secondary structure

PROCHECK G-factors (for backbone and all dihedrals)(Laskowski et al, 1996 JBNMR 8: 477)

Probability of dihedral angles of a residue type to be within a given range

MolProbity (MAGE, prekin, probe, reduce)(Lovell S C, et al, Proteins 2003 50: 437)(Word et al, 2000 Prot Sci 9: 2251)

Calculate and visualize bad contacts and atomic overlaps, and Cβ deviations

Verify3D(Luthy et al, 1992 Nature 356: 83)

Likelihood of the amino acid sequence to have the three-dimensional packing seen in the structure

ProsaII(Sippl, 1990 J Mol Biol 213: 859)

Energy of pair-wise interaction from the spatial separation of atoms (Cβ atoms)

PDB validation software Close contacts, deviations of bond length and bond angle from ideality

Page 13: Precision and Accuracy of NMR Structures · F-measure is the overall performance score calculated from the recall and precision. It provides measure of the overall fit between the

Protein Structure Validation Software Suite (PSVS)

Page 14: Precision and Accuracy of NMR Structures · F-measure is the overall performance score calculated from the recall and precision. It provides measure of the overall fit between the

“Rules of Thumb”

I. ProCheck(all) and MolProbitybest distinguish low, med, high resolution crystal structures

II. Verify3D and ProsaIIbest distinguish incorrrect folds

Page 15: Precision and Accuracy of NMR Structures · F-measure is the overall performance score calculated from the recall and precision. It provides measure of the overall fit between the

ProCheck and MolProbity Z ScoresX-ray

Bhattacharya, Tejero, MontelionePROTEINS. 2007

< 1.8 Ang 1.8 - 2.5 Ang 2.5

- 3.5 Ang Structures determined with higher resolution data have better Z scores, suggesting that these scores do in fact track structure accuracy

Page 16: Precision and Accuracy of NMR Structures · F-measure is the overall performance score calculated from the recall and precision. It provides measure of the overall fit between the

ProCheck and MolProbity Z Scores

X-ray NMRFollowing NMR Structure Refinement

Why NMR different from X-ray?

- “Solution structure”

- Multiple conformational states?

- Less accurate structures?Bhattacharya, Tejero, MontelionePROTEINS. 2007

Page 17: Precision and Accuracy of NMR Structures · F-measure is the overall performance score calculated from the recall and precision. It provides measure of the overall fit between the

-6

-5

-4

-3

-2

-1

0

20 00 20 01 20 0 2 20 03 20 04 20 05 20 06 A verag e

X -R a yNM R

-2 0

-1 6

-1 2

-8

-4

0

2 0 0 0 2 0 0 1 2 0 0 2 2 0 0 3 2 0 0 4 2 0 0 5 2 0 0 6 A ve ra g e

X -R a yN M R

Quality Scores for NESG NMR Structures Continue to Increase

ProCheck All Dihedrals MolProprobity Clash Score

2006:red 2005:green 2004:blue 2003:black 2002:magenta 2000-2001:yellowPr

oche

ck(A

ll)

MolProbity

Page 18: Precision and Accuracy of NMR Structures · F-measure is the overall performance score calculated from the recall and precision. It provides measure of the overall fit between the

151 NMR - Crystal Structure Pairs

Filtered to be in same ligand state, similar pH

Analysis for FindCore core (bb and sc) atoms only

Line - rmsd of superimposed NMR ensemble “PRECISION”

Shade - rmsdbetween median NMR conformer and Xtal structure “ACCURACY”

Andrec, Snyder, Montelione, Levy, et al

1. NMR overestimates precision of the ensemble

2. NMR provides inaccurate global structure- Ensemble averaging- Just plain wrong

3. Xray is inaccurate

4. Crystallization shifts global conformational equilibria

Need to compare NMR parameters in solution and crystal - ssNMR

Page 19: Precision and Accuracy of NMR Structures · F-measure is the overall performance score calculated from the recall and precision. It provides measure of the overall fit between the

NMR RPF Scores: Protein NMR Structure Quality Assessment by Rapid Comparison

of NOESY and 3D Structure Data

Y. J. Huang, R. Powers, G.T. Montelione (2005)

J. Am. Chem. Soc. 127: 1665-74

Page 20: Precision and Accuracy of NMR Structures · F-measure is the overall performance score calculated from the recall and precision. It provides measure of the overall fit between the

NMR “R-factors” - RPF Quality scores

3DStructure

NOESY Peak List /

Assignment List

Essentially, acomparison ofcalc and observed contact maps

Goodness-of-fit of theNOESY peak listdata with 3D structure.

Violations map tothe 3D structure andto the NOESY spectrum

Page 21: Precision and Accuracy of NMR Structures · F-measure is the overall performance score calculated from the recall and precision. It provides measure of the overall fit between the

NMR “R-factors” - RPF Quality scores

Recall percentage of peaks detected in the NMR experiments that are consistent with the interproton distances of the 3D structures; i.e. NOESY peaks not consistent with the 3D structure. TP / (TP + FN)

Precision percentage of close distance proton pairs in the query structures whose back calculated NOE cross peaks are also actually detected in NMR experiments, weighted by their distances d(h1, h2) -6; i.e. short distances in the 3D structure with no corresponding NOESY cross peak. TP / (TP + FP)

F-measure is the overall performance score calculated from the recall and precision. It provides measure of the overall fit between the query model structure and the experimental data. (2 x Recall x Precision) / (Recall + Precision)

DP-score measures how the query structure is distinguished from the freely-rotating chain model, and scaled to the completeness of the NOESY data (normalized F-measure score).

3DStructure

NOESY Peak List /

Assignment List

Page 22: Precision and Accuracy of NMR Structures · F-measure is the overall performance score calculated from the recall and precision. It provides measure of the overall fit between the

NNN

C CC

Recall and Precision Violations

Recall = 0.825, Precision = 0.971F = 0.892 and DP = 0.723

Recall = 0.769, Precision = 0.969F = 0.857 and DP = 0.629

Recall = 0.729, Precision = 0.917F = 0.812 and DP = 0.508

Page 23: Precision and Accuracy of NMR Structures · F-measure is the overall performance score calculated from the recall and precision. It provides measure of the overall fit between the

FGF-2

0

20

40

60

80

100

recall precision F-score DP-score

0

20

40

60

80

100

recall precision F-score DP-score

%

%

%

0

20

40

60

80

100

recall precision F-score DP-scoreFreely Rotating Chain Incorrect Fold I (beta)Incorrrect Fold II (alpha) Incorrect Fold III (alpha+beta)AutoStructure/DYANA AutoStructure/XPLORExpert I Expert IIG-Ideal

IL13

MMP-1

Sensitivity of the quality scores

Page 24: Precision and Accuracy of NMR Structures · F-measure is the overall performance score calculated from the recall and precision. It provides measure of the overall fit between the

< 2 Å Partially correct Different fold

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 2 4 6 8 10 12

rmsd

DP

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 2 4 6 8 10 12

rmsd

F-m

easu

re

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 2 4 6 8 10 12

rmsd

Rec

all

00.1

0.20.3

0.40.5

0.60.7

0.80.9

1

0 2 4 6 8 10 12

rmsd

Prec

isio

n

FGF-2

MMP-1IL-13

(-0.795) (-0.459)

(-0.882)(-0.866)

Sensitivity of the quality scores

Recall

F-measure

Precision

DP-score

Page 25: Precision and Accuracy of NMR Structures · F-measure is the overall performance score calculated from the recall and precision. It provides measure of the overall fit between the

Quality control of AutoStructure trajectories using AutoQF Scores

0102030405060708090

100

F-scor

e (%

)

0102030405060708090

100

0 1 2 3 4 5 6 7 8 9 10

AutoStructure cycle

DP-

scor

e (%

)

FGF-2 M MP-1 IL-13

0102030405060708090

100

Peak

s assigne

d(%

)

0123456789

10

Mea

n Diff

eren

ce(Å

)

0123456789

10

RM

SD(Å

)

FGF-2 MMP-1 IL-13

Cyc

le 1

Cy c

le 2

Cyc

le 1

0M

anua

l

Hea

vy-a

tom

R

MS

DM

ean

Diff

eren

ces

% P

eak

assi

gned

FD

P

>0.4>0.6

>0.7

>0.9

< 1.0Å

Page 26: Precision and Accuracy of NMR Structures · F-measure is the overall performance score calculated from the recall and precision. It provides measure of the overall fit between the

RPF Module in AutoStructure

Page 27: Precision and Accuracy of NMR Structures · F-measure is the overall performance score calculated from the recall and precision. It provides measure of the overall fit between the

II. Peak Pickingexample: StR5 project

false positives false negatives

Page 28: Precision and Accuracy of NMR Structures · F-measure is the overall performance score calculated from the recall and precision. It provides measure of the overall fit between the

HR2106 PSVS / RPF Analysis

Calculated as a Calculated as a MonomerMonomer

Calculated Calculated as a as a DimerDimer

MAGEMAGE

Clash:Clash:

Knowledge Knowledge Based Based

AssessmentAssessment

RPFRPF

PrecisionPrecision

Violations:Violations:

Goodness of fit to Goodness of fit to NOESY dataNOESY data

Human dyneinlight chain 2A

Page 29: Precision and Accuracy of NMR Structures · F-measure is the overall performance score calculated from the recall and precision. It provides measure of the overall fit between the

Input for PSVS / RPFXray Structure

CoordinatesResolution, R, Rfree

NMR StructureCoordinates (ensemble)Constraint List

Dyana, Cyana, Xplor, CNS format

RPF AnalysisResonance Assignments

BioMagResDB formatNOESY Peak List

Frequencies, IntensitiesPSVS runs on any Web browserPSVS results in minutesE-mail sent to user

Page 30: Precision and Accuracy of NMR Structures · F-measure is the overall performance score calculated from the recall and precision. It provides measure of the overall fit between the

Summary

(i) Precision: RMSD; FindCore(ii) Accuracy: PSVS; RPF

value of multiple structure quality assessment scores

Other issues:• BMRB: AVS software• Presentation of Structures: stereoview(v) Descriptions in PDB header.

- exact ordered and disordered residue ranges

Page 31: Precision and Accuracy of NMR Structures · F-measure is the overall performance score calculated from the recall and precision. It provides measure of the overall fit between the

Summary

Page 32: Precision and Accuracy of NMR Structures · F-measure is the overall performance score calculated from the recall and precision. It provides measure of the overall fit between the

Summary

Page 33: Precision and Accuracy of NMR Structures · F-measure is the overall performance score calculated from the recall and precision. It provides measure of the overall fit between the

Acknowledgments

Gaetano Montelione

Software Developers: NMR Group:Hunter Moseley Paolo RossiJanet Huang Swapna GurlaMike BaranDehua HangRoberto Tejero Protein Production Group:

Tom ActonRong Xiao