worldwide protein data bank common d&a project sequence processing modular demo may 6, 2010...

18
Worldwide Protein Data Bank www.wwpdb.org Common D&A Project Sequence Processing Modular Demo May 6, 2010 Project Deliverable

Upload: allison-marshall

Post on 19-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Worldwide Protein Data Bank  Common D&A Project Sequence Processing Modular Demo May 6, 2010 Project Deliverable

Worldwide Protein Data Bank

www.wwpdb.org

Common D&A Project

Sequence Processing Modular Demo

May 6, 2010 Project Deliverable

Page 2: Worldwide Protein Data Bank  Common D&A Project Sequence Processing Modular Demo May 6, 2010 Project Deliverable

Worldwide Protein Data Bank

www.wwpdb.org

LigandProcessingLigandProcessing

Release ProcessingRelease Processing

Geometry CKValidationGeometry CKValidation

Calculated annotations(Bio Assem)

Calculated annotations(Bio Assem)

Corrections (water trans, pro-chiral ck)

Corrections (water trans, pro-chiral ck)

User Interface

WFE/API

Requirements

Design

Progress Tracking/ Status

Progress Tracking/ Status

Sequence ProcessingModule

4.1, 4.24.34.44.5

DeliveredMay 6, 2010

Annotation Pipeline

Page 3: Worldwide Protein Data Bank  Common D&A Project Sequence Processing Modular Demo May 6, 2010 Project Deliverable

Worldwide Protein Data Bank

Common D&A Project March 2010 Project Team Meeting

Technical Deliverable Details Master Format. Finalization of Physical Data Exchange Extended API Tracking DB creation/support Extended Work Flow Engine (WFE) Work Flow Manager (WFM) Work Flow Manager User Interface (WFM UI) Annotator graphical interface for sequence module Integration of all components creating the Sequence

Processing “module”

Page 4: Worldwide Protein Data Bank  Common D&A Project Sequence Processing Modular Demo May 6, 2010 Project Deliverable

Worldwide Protein Data Bank

Common D&A Project March 2010 Project Team Meeting

Key Requirements Met Complete and “correct” entries processed automatically Sequence mutation – editing and visualization supported Sequence mismatch – editing and visualization supported Processing of very large structures, ie. Ribosome Polymer processing, individual and in complex Short peptide complex cross reference Sequence matches sortable by % match Annotator triggered global ALA/GLY substitutions Support Self reference for cases with no Uniprot match.

Page 5: Worldwide Protein Data Bank  Common D&A Project Sequence Processing Modular Demo May 6, 2010 Project Deliverable

Worldwide Protein Data Bank

Common D&A Project March 2010 Project Team Meeting

Future Enhancement List Automation of “gap” recognition and processing* Implementation of Uniprot isoform, variant searches for

mismatched proteins.* Validation and checks within the Sequence Editor Modified residues – support one to many sequence

alignments (ie. chromophore) Chimera processing Conconavalin A Example (alternate splicing)

*PDBe code to be packaged for module integration

Page 6: Worldwide Protein Data Bank  Common D&A Project Sequence Processing Modular Demo May 6, 2010 Project Deliverable

Worldwide Protein Data Bank

Common D&A Project March 2010 Project Team Meeting

Sequence Module Processing

Page 7: Worldwide Protein Data Bank  Common D&A Project Sequence Processing Modular Demo May 6, 2010 Project Deliverable

Worldwide Protein Data Bank

Common D&A Project March 2010 Project Team Meeting

Sequence Module Processing

Page 8: Worldwide Protein Data Bank  Common D&A Project Sequence Processing Modular Demo May 6, 2010 Project Deliverable

Worldwide Protein Data Bank

Common D&A Project March 2010 Project Team Meeting

Under the covers…

Page 9: Worldwide Protein Data Bank  Common D&A Project Sequence Processing Modular Demo May 6, 2010 Project Deliverable

Worldwide Protein Data Bank

Common D&A Project March 2010 Project Team Meeting

Data Management

File Prior to Seq. Processing

File After Seq. Processing Database Search

Page 10: Worldwide Protein Data Bank  Common D&A Project Sequence Processing Modular Demo May 6, 2010 Project Deliverable

Worldwide Protein Data Bank

Common D&A Project March 2010 Project Team Meeting

Data ManagementFile System After Seq. Processing Editor Task: New results returned to archival storage …

Page 11: Worldwide Protein Data Bank  Common D&A Project Sequence Processing Modular Demo May 6, 2010 Project Deliverable

Worldwide Protein Data Bank

Common D&A Project March 2010 Project Team Meeting

Workflow Manager User

Interface

Workflow engineSession ID + workflowID

Domain data archive (local)

API

•Start/Stop•Launch module UIs

Depositions

Remote data – Snap Mirror

share

ApplicationsStatusData

Data

View system activity – Tracking DB

Tasks

Tracking DB

System Architecture

Page 12: Worldwide Protein Data Bank  Common D&A Project Sequence Processing Modular Demo May 6, 2010 Project Deliverable

Worldwide Protein Data Bank

Common D&A Project March 2010 Project Team Meeting

THE DEMO

A brief walk about the WFM The System at Work

– Selection of a raw file within the WFM– Trigger Sequence Processing Interface

Processing options– Tracking by the WFM of the task status

Blessing of the output

Page 13: Worldwide Protein Data Bank  Common D&A Project Sequence Processing Modular Demo May 6, 2010 Project Deliverable

Worldwide Protein Data Bank

Common D&A Project March 2010 Project Team Meeting

System Extensibility: Set up for adding New Functionality

Page 14: Worldwide Protein Data Bank  Common D&A Project Sequence Processing Modular Demo May 6, 2010 Project Deliverable

Worldwide Protein Data Bank

Common D&A Project March 2010 Project Team Meeting

Next Steps Sequence Processing Module

– Sequence Processing Module to go into targeted Testing – Modifications to be adopted as prioritized by the team and

approved by the PI’s– User Manual development

Ligand Processing– Finalize requirements– Develop Design– Development Module with delivery target

end of August

Page 15: Worldwide Protein Data Bank  Common D&A Project Sequence Processing Modular Demo May 6, 2010 Project Deliverable

Worldwide Protein Data Bank

Common D&A Project March 2010 Project Team Meeting

Process Overview - Ligand ProcessingStep 1.0DepositionFormatCheck

Step 1.0DepositionFormatCheck

Step 3.0LigandProcessing

Step 3.0LigandProcessing

Step 2.0SequenceProcessing

Step 2.0SequenceProcessing

Step 4.0Calculation of Derived data

Step 4.0Calculation of Derived data

Step 5.0CorrectionsWater trans pro-chiral ck

Step 5.0CorrectionsWater trans pro-chiral ck

Step 6.0Calculated Annotation - Biological Assembly

Step 6.0Calculated Annotation - Biological Assembly

Step 7.0GeometryCkValidation

Step 7.0GeometryCkValidation

Step 8.0ReleaseprocessingGenerateFiles

Step 8.0ReleaseprocessingGenerateFiles

Step 9Send to Authors

Step 9Send to Authors

WF

E,A

PI,

WF

M

Graphical User Interface

Page 16: Worldwide Protein Data Bank  Common D&A Project Sequence Processing Modular Demo May 6, 2010 Project Deliverable

Worldwide Protein Data Bank

Common D&A Project March 2010 Project Team Meeting

Ligand Processing – Functional Requirements

Annotator exchange – experience with, and analysis of, existing work flows

Draft of new TO BE process – Level 1 Annotator Team elaborated - Level 2,3 Annotator Team created decision trees and SIPOCS for

all process steps. Annotators documented key Use Cases Annotator Team mapped existing functional software

components to the proposed workflow components. Annotator Team created interface mock ups for

interactive components

Page 17: Worldwide Protein Data Bank  Common D&A Project Sequence Processing Modular Demo May 6, 2010 Project Deliverable

Worldwide Protein Data Bank

Common D&A Project March 2010 Project Team Meeting

Ligand Processing – Technical Requirements and Design

Create Plan, identify resources Tech Team to review the requirements Review Functional software components Capture technical requirements Complete the draft design for the Ligand processing

module Develop module

Page 18: Worldwide Protein Data Bank  Common D&A Project Sequence Processing Modular Demo May 6, 2010 Project Deliverable

Worldwide Protein Data Bank

Common D&A Project March 2010 Project Team Meeting

Project Team