emdataresource: structure data archiving, validation challenges · 2020-06-12 · project website...

38
EMDataResource: Structure Data Archiving, Validation Challenges Cathy Lawson EMDataResource & RCSB PDB Rutgers University S2C2 cryoEM Image Processing Workshop June 12, 2020

Upload: others

Post on 06-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EMDataResource: Structure Data Archiving, Validation Challenges · 2020-06-12 · Project Website emdataresource.org Updated every Wednesday. Growth of EM Archives ... • Coordinates

EMDataResource: Structure Data Archiving,

Validation ChallengesCathy Lawson

EMDataResource & RCSB PDBRutgers University

S2C2 cryoEM Image Processing Workshop June 12, 2020

Page 2: EMDataResource: Structure Data Archiving, Validation Challenges · 2020-06-12 · Project Website emdataresource.org Updated every Wednesday. Growth of EM Archives ... • Coordinates

Stanford University/SLAC

Rutgers University European Bioinformatics Institute

Unified Data Resource for 3DEM

■ Established 2007 under NIH Support to: ■ Develop Data Archives for 3DEM (EMDB + PDB)■ Promote Community Development of Validation and

Standards

Page 3: EMDataResource: Structure Data Archiving, Validation Challenges · 2020-06-12 · Project Website emdataresource.org Updated every Wednesday. Growth of EM Archives ... • Coordinates

Project Website

emdataresource.org

Updated every Wednesday

Page 4: EMDataResource: Structure Data Archiving, Validation Challenges · 2020-06-12 · Project Website emdataresource.org Updated every Wednesday. Growth of EM Archives ... • Coordinates

Growth of EM Archives

updated weekly: emdataresource.org/statistics.html

Page 5: EMDataResource: Structure Data Archiving, Validation Challenges · 2020-06-12 · Project Website emdataresource.org Updated every Wednesday. Growth of EM Archives ... • Coordinates

EMDB maps by year and resolution

updated weekly: emdataresource.org/statistics.html

Page 6: EMDataResource: Structure Data Archiving, Validation Challenges · 2020-06-12 · Project Website emdataresource.org Updated every Wednesday. Growth of EM Archives ... • Coordinates

Finding Cryo-EM Structuresemdataresource.org/search.html

Page 7: EMDataResource: Structure Data Archiving, Validation Challenges · 2020-06-12 · Project Website emdataresource.org Updated every Wednesday. Growth of EM Archives ... • Coordinates

EMDR Search: FeaturesAutocomplete: Interactive keyword help:

Sort, configure, filter, download search results:

Page 8: EMDataResource: Structure Data Archiving, Validation Challenges · 2020-06-12 · Project Website emdataresource.org Updated every Wednesday. Growth of EM Archives ... • Coordinates

New: Simple Search

Page 9: EMDataResource: Structure Data Archiving, Validation Challenges · 2020-06-12 · Project Website emdataresource.org Updated every Wednesday. Growth of EM Archives ... • Coordinates

New: Coronavirus Search

Page 10: EMDataResource: Structure Data Archiving, Validation Challenges · 2020-06-12 · Project Website emdataresource.org Updated every Wednesday. Growth of EM Archives ... • Coordinates

Cryo-EM Structure Deposition

EMDB, PDB

Page 11: EMDataResource: Structure Data Archiving, Validation Challenges · 2020-06-12 · Project Website emdataresource.org Updated every Wednesday. Growth of EM Archives ... • Coordinates

wwPDB OneDep System

■ Deposition system for X-ray, NMR, and EM Structures■ EM Depositions:

■ Map receives EMDB id ■ Coordinate model receives PDB id

■ Validation report is produced

Page 12: EMDataResource: Structure Data Archiving, Validation Challenges · 2020-06-12 · Project Website emdataresource.org Updated every Wednesday. Growth of EM Archives ... • Coordinates

File uploads for EM

deposition

Page 13: EMDataResource: Structure Data Archiving, Validation Challenges · 2020-06-12 · Project Website emdataresource.org Updated every Wednesday. Growth of EM Archives ... • Coordinates

FSC Curve

■ Upload XML format file■ Create XML via a software package (e.g., Relion,

EMAN, cisTEM), or use PDBe’s FSC server

PDBe.org/FSC

Page 14: EMDataResource: Structure Data Archiving, Validation Challenges · 2020-06-12 · Project Website emdataresource.org Updated every Wednesday. Growth of EM Archives ... • Coordinates

PDB Deposition Policies

• Map deposition to EMDB is mandatory for PDB depositions of 3DEM atomic coordinates

• MX atomic coordinates must be deposited in PDBx/mmCIF format • PDB format is still accepted for 3DEM and NMR (will change in future)

• Coordinates and experimental data share same release status (REL, HPUB, or HOLD). • Coordinate and experimental data are released simultaneously.

• ID’s issued only after mandatory metadata are provided• PI contact information and ORCiD ID are mandatory• Author-initiated coordinate versioning is allowed post release

wwpdb.org/documentation/policy

Page 15: EMDataResource: Structure Data Archiving, Validation Challenges · 2020-06-12 · Project Website emdataresource.org Updated every Wednesday. Growth of EM Archives ... • Coordinates

3DEM Metadata Collection

• High-level classification of the EM experiment• Software used for data collection, data processing, data analysis,

structure calculations, and refinement• Sample description (e.g., assembly, virus)• Data collection (e.g., diffraction, imaging)• Sample preparation (e.g., specimen, buffer, tomography, condition)• Image processing and reconstruction • Structure analysis for 3D fitting of atomic coordinates

Page 16: EMDataResource: Structure Data Archiving, Validation Challenges · 2020-06-12 · Project Website emdataresource.org Updated every Wednesday. Growth of EM Archives ... • Coordinates

mmCIF Data Dictionary for 3DEMTop Levelem_experimentem_software

Sample Descriptionem_entity_assemblyem_entity_assembly_molwtem_entity_assembly_naturalsourceem_entity_assembly_recombinantem_virus_entityem_virus_natural_hostem_virus_shell

Data Collectionem_diffractionem_diffraction_shellem_diffraction_statsem_image_recordingem_image_scansem_imagingem_imaging_optics

Sample/Specimen Preparationem_bufferem_buffer_componentem_crystal_formationem_embeddingem_sample_supportem_specimenem_stainingem_vitrification

em_fiducial_markers*em_focused_ion_beam*em_grid_pretreatment*em_high_pressure_freezing*em_shadowing*em_support_film*em_tomography*em_tomography_specimen*em_ultramicrotomy*

Image Processing & Reconstructionem_3d_reconstructionem_image_processingem_particle_selectionem_volume_selectionem_ctf_correction

em_2d_crystal_entityem_3d_crystal_entityem_helical_entityem_single_particle_entity

em_euler_angle_assignment*em_final_classification*em_start_model*

Structure Analysisem_3d_fittingem_3d_fitting_listem_fsc_curve*

Experimental Dataem_map*em_structure_factors*em_layer_lines*

*All categories are collected by the OneDep system. Data from most categories are archived in both PDB and EMDB; asterisked categories are archived only in EMDB.

Page 17: EMDataResource: Structure Data Archiving, Validation Challenges · 2020-06-12 · Project Website emdataresource.org Updated every Wednesday. Growth of EM Archives ... • Coordinates

Cryo-EM Structure Validation

Page 18: EMDataResource: Structure Data Archiving, Validation Challenges · 2020-06-12 · Project Website emdataresource.org Updated every Wednesday. Growth of EM Archives ... • Coordinates

Validation Report for EM Structuresv1 (2016-2019)

■ Map resolution as reported by depositor■ Model geometry statistics■ No fit-to-map validation

Page 19: EMDataResource: Structure Data Archiving, Validation Challenges · 2020-06-12 · Project Website emdataresource.org Updated every Wednesday. Growth of EM Archives ... • Coordinates

Validation Report for EM Structuresv2 (2020)

■ New:■ Map, Map+Model Images■ FSC curve(s)■ Rotationally averaged power spectrum■ Fit-to-Map: Atom inclusion at recommended contour

level■ Reports for both EMDB-only and EMDB+PDB

■ Right now:■ V2 reports are only available to depositors

■ Very Soon:■ Will be vailable for download from EMDB and PDB

archives

Page 20: EMDataResource: Structure Data Archiving, Validation Challenges · 2020-06-12 · Project Website emdataresource.org Updated every Wednesday. Growth of EM Archives ... • Coordinates

■ Frontiers in Cryo-EM Validation Jan 2019

■ wwPDB Single-Particle Data Management Jan 2020

Ongoing Cryo-EM Validation Discussion

Page 21: EMDataResource: Structure Data Archiving, Validation Challenges · 2020-06-12 · Project Website emdataresource.org Updated every Wednesday. Growth of EM Archives ... • Coordinates

Preview of Some Outcomes/Recommendations

■ Map Resolution: ■ (unmasked, minimally filtered) half-maps should be

deposited ■ Data archive should calculate FSC from these half-

maps■ Global Map-Model Fit:

■ By Map-Model FSC (full curve, read at threshold=0.5)

Page 22: EMDataResource: Structure Data Archiving, Validation Challenges · 2020-06-12 · Project Website emdataresource.org Updated every Wednesday. Growth of EM Archives ... • Coordinates

Validation Challenges

Page 23: EMDataResource: Structure Data Archiving, Validation Challenges · 2020-06-12 · Project Website emdataresource.org Updated every Wednesday. Growth of EM Archives ... • Coordinates

EM Validation Task Force 2010 Recommendations

■ Full FSC curve from independent half-maps

■ Model Stereochemistry same as X-ray / NMR

■ Other Metrics: More Research Needed

Henderson et al. (2012) Structure 20, 205-21410.1016/j.str.2011.12.014

Page 24: EMDataResource: Structure Data Archiving, Validation Challenges · 2020-06-12 · Project Website emdataresource.org Updated every Wednesday. Growth of EM Archives ... • Coordinates

Single Particle Cryo-EMin PDB:

Release Year vs Resolution

Validation in a Changing Landscape

■ How accurate are the maps?

■ Do the models conform to good valence geometry and stereochemistry?

■ How well do the models fit the maps?

■ What are the metrics for evaluation and are they good enough?

To answer these questions, we conducted a series of Challenges

Page 25: EMDataResource: Structure Data Archiving, Validation Challenges · 2020-06-12 · Project Website emdataresource.org Updated every Wednesday. Growth of EM Archives ... • Coordinates

Promoting Standards Development: ChallengesChallenge Workshop(s)

Resol(Å)

Goals for Participants and Assessors

2010 Model Challenge2011 Hawaii2012 Houston

2.5-24q Produce best models against selected mapsq Explore segmentation, secondary structure

detection, rigid body, flexible fitting, ab initio

2016-2017 Map Challenge2017 Lake Tahoe2017 Stanford/SLAC

2.5-5

q Produce best maps from selected raw images q Produce best models against selected mapsq Compare reconstruction, modeling practicesq Explore assessment strategies esp. map and

model fit-to-map

2016-2017 Model Challenge2015 Boston2017 New Orleans 2017 Stanford/SLAC

2019 Model “Metrics” Challenge2019 Stanford/SLAC

1.8-3.1 q Produce best models against selected mapsq Explore Model metrics with focus on Fit-to-Map

Page 26: EMDataResource: Structure Data Archiving, Validation Challenges · 2020-06-12 · Project Website emdataresource.org Updated every Wednesday. Growth of EM Archives ... • Coordinates

Challenge Workshops26

2011

2017

2019

Page 27: EMDataResource: Structure Data Archiving, Validation Challenges · 2020-06-12 · Project Website emdataresource.org Updated every Wednesday. Growth of EM Archives ... • Coordinates

2010 Model Challenge: Observations■ Established community

around a common problem■ Identified critical

standardization issues related to data deposition

■ Identified issues to explore in future challenges

■ Identified Challenges as mechanism to establish modeling benchmarks

10.1002/bip.22081

Page 28: EMDataResource: Structure Data Archiving, Validation Challenges · 2020-06-12 · Project Website emdataresource.org Updated every Wednesday. Growth of EM Archives ... • Coordinates

2016-2017 Map & Model Challenges: Observations

■ Innovative methods for map and model fit-to-map assessment were introduced

■ Map quality depended on participant level of experience

■ Maps reported as the same resolution looked different from each other

■ Models were “all over the place”

10.1016/j.jsb.2018.10.004

Page 29: EMDataResource: Structure Data Archiving, Validation Challenges · 2020-06-12 · Project Website emdataresource.org Updated every Wednesday. Growth of EM Archives ... • Coordinates

2016 Map & Model Challenges: Recommendations

qMap Resolution by independent half-map FSC: uniform definition + software implementation needed

qNovel model-based methods may be useful for estimating map resolvability

qNeeded: further review of fit-to-map metrics

Page 30: EMDataResource: Structure Data Archiving, Validation Challenges · 2020-06-12 · Project Website emdataresource.org Updated every Wednesday. Growth of EM Archives ... • Coordinates

ADH 2.9 ÅAPOF 1.8 Å APOF 2.3 Å APOF 3.1 Å

2019 Challenge Targets

EMD-20026additional map #1

EMD-20027additional map #2

EMD-20028additional map #2

EMD-0406primary map

Page 31: EMDataResource: Structure Data Archiving, Validation Challenges · 2020-06-12 · Project Website emdataresource.org Updated every Wednesday. Growth of EM Archives ... • Coordinates

Challenge Pipeline

model-compare.emdataresource.org

Page 32: EMDataResource: Structure Data Archiving, Validation Challenges · 2020-06-12 · Project Website emdataresource.org Updated every Wednesday. Growth of EM Archives ... • Coordinates

Correlation

Full map density TEMPY CCC | PHENIX box_CC

Density within a mask TEMPy CCC_overlap | Segment Mander’s OverlapPHENIX CC_peaks | CC_volume | CC_mask

Density-derived functions TEMPY Mutual Information(MI) | MI_overlap | Laplacian Filtered

Density at atom positions MAPQ Q-score: vs Reference Gaussians (r=0-2 Å)

FSC curveSingle point PHENIX Resolution Map-Model FSC = 0.5

Integration CCPEM REFMAC5 FSCavg curve area to defined resolution limit

Atom Inclusion TEMPy Envelope | EMDB Atom Inclusion

Rotamer EMRinger Z-score protein Cg-atom paths around c1

ConformationBackbone

CaBLAM Cɑ-trace Cɑ-only virtual dihedrals CaBLAM Conformation Cɑ and CO-containing virtual dihedralsMOLPROBITY Ramachandran

Sidechain MOLPROBITY Rotamer

Valence Geometry PHENIX Bond | Bond angle | Chirality | Planarity | Dihedral

Clashes MOLPROBITY Clashscore

Energy PROQ3 energy and predicted features

Superposition

Cɑ Superposition OPENSTRUCT RMSD-Cɑ

Distance cutoffs OPENSTRUCT Global Distance Calculation (GDC) all | sidechain Global Distance Test (GDT) total score | high accuracy

Sequence assignment PHENIX seq match | Cɑ atom position match | overall score

*Multiple references DAVIS-QA average of pairwise GDT_TS scores

DistancesPer chain LDDT Local difference distance test

All chains OPENSTRUCT oligomeric LDDT | weighted oligomeric LDDT

Contacts

Contact area CAD Contact Area Difference

Shared contacts OPENSTRUCT Quaternary Structure (QS) best, global

Hydrogen bonds HBPLUS H-bond Precision all | nonlocal | Similarity all | nonlocal

Coordinates Only

Fit to Map

vs Reference Model

vs Models Consensus*

Page 33: EMDataResource: Structure Data Archiving, Validation Challenges · 2020-06-12 · Project Website emdataresource.org Updated every Wednesday. Growth of EM Archives ... • Coordinates

2019 Challenge: 4 Metrics Stood Out

CaBLAM: virtual dihedrals based on Cɑ, C=O compared to statistics from high quality PDB models [Williams 2018]

Coordinates alone:

MAPQ Q-score: Per-atom Correlation vs Reference Gaussian (r=0-2 Å) [Pintilie in press]

PHENIX Resolution Map-Model FSC = 0.5 [Afonine 2018]

EMRinger Z-score “rotamericity” of map for protein Cgatom paths around c1 [Barad 2015]

Fit-to-Map:

Page 34: EMDataResource: Structure Data Archiving, Validation Challenges · 2020-06-12 · Project Website emdataresource.org Updated every Wednesday. Growth of EM Archives ... • Coordinates

2019 Challenge: Observations

q ab initio methods represented in the challenge performed extremely well for near-atomic maps (1.8 - 3.1 Å)

q For evaluating conformation: CaBLAM was a useful “orthogonal” metric to Ramachandran statistics

q Within single map targets, all fit-to-map metrics were equivalent (similar model rankings)

q only Q-score, EMRinger, and Map-Model FSC @ 0.5 provided useful comparisons across map targets

Page 35: EMDataResource: Structure Data Archiving, Validation Challenges · 2020-06-12 · Project Website emdataresource.org Updated every Wednesday. Growth of EM Archives ... • Coordinates

2019 Challenge: Recommendations

■ Most fit-to-map metrics are fine for optimization against a single experimental map

■ Resolution-sensitive metrics are preferred for ranking diverse structures in an archive

■ CaBLAM is a valuable new tool for evaluating protein backbone conformation

Page 36: EMDataResource: Structure Data Archiving, Validation Challenges · 2020-06-12 · Project Website emdataresource.org Updated every Wednesday. Growth of EM Archives ... • Coordinates

Topics for Future Challenges

■ Lower Resolution■ Membrane Proteins■ Nucleic Acids■ Models derived from Tomograms■ COVID-19

36

Page 37: EMDataResource: Structure Data Archiving, Validation Challenges · 2020-06-12 · Project Website emdataresource.org Updated every Wednesday. Growth of EM Archives ... • Coordinates

Stanford University(Baylor Coll. Med.)

Rutgers University European Bioinformatics Institute

Unified Data Resource for 3DEM

EMDataResource is funded by the US National Institutes of Health/National Institute of General Medical Science, R01GM079429-12

Wah ChiuGreg PintilieMike Schmid

Steven LudtkeMatt BakerCorey HrycIan Rees

UC Davis:Andriy Kryshtafovych

Cathy LawsonHelen BermanBrinda VallatBrian Hudson

John WestbrookBatsal DevkotaRaul SalaChunxiao Bi

Ardan PatwardhanGerard KelywegtSanja AbbottRyan PyeOsman SalihZhe Wang

Kim HenrickRichard NewmanChristoph BestGlen van GinkelEduardo Sanz-GarciaIngvar Lagerstedt

Page 38: EMDataResource: Structure Data Archiving, Validation Challenges · 2020-06-12 · Project Website emdataresource.org Updated every Wednesday. Growth of EM Archives ... • Coordinates

References

■ Manuscript preprint will be available shortly at BioRxiv: Outcomes of the 2019 EMDataResource model challenge: validation of cryo-EM models at near-atomic resolution

■ Lawson CL, Berman HM, Chiu W (2020). Evolving Data Standards for cryo-EM Structures. Struct. Dynamics 7, 014701. 10.1063/1.5138589

■ Lawson CL, Chiu W (2018) Comparing cryo-EM structures (Editorial). J Struct Biol. 204, 523-526. 10.1016/j.jsb.2018.10.004

■ Patwardhan A & Lawson CL (2016). Databases and Archiving for CryoEM. Methods Enzymol 579, 393-412. 10.1016/bs.mie.2016.04.015

■ Lagerstedt I, et al (2013). Web-based visualisation and analysis of 3D electron-microscopy data from EMDB and PDB. J Struct Biol. 184, 173-81. 10.1016/j.jsb.2013.09.021

■ Henderson R, et al (2012) Outcome of the first electron microscopy validation task force meeting. Structure 20, 205-214. 10.1016/j.str.2011.12.014