worldwide protein data bank common d&a project sequence processing modular demo may 6, 2010...
TRANSCRIPT
Worldwide Protein Data Bank
www.wwpdb.org
Common D&A Project
Sequence Processing Modular Demo
May 6, 2010 Project Deliverable
Worldwide Protein Data Bank
www.wwpdb.org
LigandProcessingLigandProcessing
Release ProcessingRelease Processing
Geometry CKValidationGeometry CKValidation
Calculated annotations(Bio Assem)
Calculated annotations(Bio Assem)
Corrections (water trans, pro-chiral ck)
Corrections (water trans, pro-chiral ck)
User Interface
WFE/API
Requirements
Design
Progress Tracking/ Status
Progress Tracking/ Status
Sequence ProcessingModule
4.1, 4.24.34.44.5
DeliveredMay 6, 2010
Annotation Pipeline
Worldwide Protein Data Bank
Common D&A Project March 2010 Project Team Meeting
Technical Deliverable Details Master Format. Finalization of Physical Data Exchange Extended API Tracking DB creation/support Extended Work Flow Engine (WFE) Work Flow Manager (WFM) Work Flow Manager User Interface (WFM UI) Annotator graphical interface for sequence module Integration of all components creating the Sequence
Processing “module”
Worldwide Protein Data Bank
Common D&A Project March 2010 Project Team Meeting
Key Requirements Met Complete and “correct” entries processed automatically Sequence mutation – editing and visualization supported Sequence mismatch – editing and visualization supported Processing of very large structures, ie. Ribosome Polymer processing, individual and in complex Short peptide complex cross reference Sequence matches sortable by % match Annotator triggered global ALA/GLY substitutions Support Self reference for cases with no Uniprot match.
Worldwide Protein Data Bank
Common D&A Project March 2010 Project Team Meeting
Future Enhancement List Automation of “gap” recognition and processing* Implementation of Uniprot isoform, variant searches for
mismatched proteins.* Validation and checks within the Sequence Editor Modified residues – support one to many sequence
alignments (ie. chromophore) Chimera processing Conconavalin A Example (alternate splicing)
*PDBe code to be packaged for module integration
Worldwide Protein Data Bank
Common D&A Project March 2010 Project Team Meeting
Sequence Module Processing
Worldwide Protein Data Bank
Common D&A Project March 2010 Project Team Meeting
Sequence Module Processing
Worldwide Protein Data Bank
Common D&A Project March 2010 Project Team Meeting
Under the covers…
Worldwide Protein Data Bank
Common D&A Project March 2010 Project Team Meeting
Data Management
File Prior to Seq. Processing
File After Seq. Processing Database Search
Worldwide Protein Data Bank
Common D&A Project March 2010 Project Team Meeting
Data ManagementFile System After Seq. Processing Editor Task: New results returned to archival storage …
Worldwide Protein Data Bank
Common D&A Project March 2010 Project Team Meeting
Workflow Manager User
Interface
Workflow engineSession ID + workflowID
Domain data archive (local)
API
•Start/Stop•Launch module UIs
Depositions
Remote data – Snap Mirror
share
ApplicationsStatusData
Data
View system activity – Tracking DB
Tasks
Tracking DB
System Architecture
Worldwide Protein Data Bank
Common D&A Project March 2010 Project Team Meeting
THE DEMO
A brief walk about the WFM The System at Work
– Selection of a raw file within the WFM– Trigger Sequence Processing Interface
Processing options– Tracking by the WFM of the task status
Blessing of the output
Worldwide Protein Data Bank
Common D&A Project March 2010 Project Team Meeting
System Extensibility: Set up for adding New Functionality
Worldwide Protein Data Bank
Common D&A Project March 2010 Project Team Meeting
Next Steps Sequence Processing Module
– Sequence Processing Module to go into targeted Testing – Modifications to be adopted as prioritized by the team and
approved by the PI’s– User Manual development
Ligand Processing– Finalize requirements– Develop Design– Development Module with delivery target
end of August
Worldwide Protein Data Bank
Common D&A Project March 2010 Project Team Meeting
Process Overview - Ligand ProcessingStep 1.0DepositionFormatCheck
Step 1.0DepositionFormatCheck
Step 3.0LigandProcessing
Step 3.0LigandProcessing
Step 2.0SequenceProcessing
Step 2.0SequenceProcessing
Step 4.0Calculation of Derived data
Step 4.0Calculation of Derived data
Step 5.0CorrectionsWater trans pro-chiral ck
Step 5.0CorrectionsWater trans pro-chiral ck
Step 6.0Calculated Annotation - Biological Assembly
Step 6.0Calculated Annotation - Biological Assembly
Step 7.0GeometryCkValidation
Step 7.0GeometryCkValidation
Step 8.0ReleaseprocessingGenerateFiles
Step 8.0ReleaseprocessingGenerateFiles
Step 9Send to Authors
Step 9Send to Authors
WF
E,A
PI,
WF
M
Graphical User Interface
Worldwide Protein Data Bank
Common D&A Project March 2010 Project Team Meeting
Ligand Processing – Functional Requirements
Annotator exchange – experience with, and analysis of, existing work flows
Draft of new TO BE process – Level 1 Annotator Team elaborated - Level 2,3 Annotator Team created decision trees and SIPOCS for
all process steps. Annotators documented key Use Cases Annotator Team mapped existing functional software
components to the proposed workflow components. Annotator Team created interface mock ups for
interactive components
Worldwide Protein Data Bank
Common D&A Project March 2010 Project Team Meeting
Ligand Processing – Technical Requirements and Design
Create Plan, identify resources Tech Team to review the requirements Review Functional software components Capture technical requirements Complete the draft design for the Ligand processing
module Develop module
Worldwide Protein Data Bank
Common D&A Project March 2010 Project Team Meeting
Project Team