wwpdb common d&a project january 28, 2010

20
Worldwide Protein Data Bank www.wwpdb.org wwPDB Common D&A Project January 28, 2010 Steering Committee Project Update

Upload: wanda-barrett

Post on 31-Dec-2015

44 views

Category:

Documents


1 download

DESCRIPTION

wwPDB Common D&A Project January 28, 2010. Steering Committee Project Update. Status of D&A initial production deliverable: Sequence Editor tool development Integration within existing pipelines Status of WF infrastructure initial implementation: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: wwPDB Common D&A Project  January 28, 2010

Worldwide Protein Data Bank

www.wwpdb.org

wwPDB Common D&A Project January 28, 2010

Steering Committee

Project Update

Page 2: wwPDB Common D&A Project  January 28, 2010

Worldwide Protein Data Bank

Common D&A Project January 2010 Update

Update report Status of D&A initial production deliverable:

– Sequence Editor tool development– Integration within existing pipelines

Status of WF infrastructure initial implementation: – Sequence Processing components (external search, internal

analysis etc) integrated by WF engine and manager into the “new” Sequence Processing Module.

– Integration of Sequence Processing Module into existing pipeline. RECONSIDER Timeline Estimate and Strategy

Next Phase– Ligand Processing: Planning

Page 3: wwPDB Common D&A Project  January 28, 2010

Worldwide Protein Data Bank

Common D&A Project January 2010 Update

Overview of deliverable status for:Sequence Editor tool

Deliverable timelines have been extended to enable full response to user testing input (expanded requirements) and to ensure development to agreed upon design.

Completion of Interface with additional prioritized requirements - projected Feb 15

Integration within current production pipelines – Initial implementation of Master Format and format conversion

support

In Use by annotators by Feb 25

Page 4: wwPDB Common D&A Project  January 28, 2010

Worldwide Protein Data Bank

Common D&A Project January 2010 Update

Sequence Editor Tool Technologies and Standards

Model View Controller (MVC) Design – – Separates data/application from presentation as much as

possible

Client/Server protocol – AJAX using JSON protocol REST style service definitions

Server – Apache with embedded WSGI (mod_wsgi)

Application – – Python with C++ extensions (Boost/Python)

All the good acronyms!

Page 5: wwPDB Common D&A Project  January 28, 2010

Worldwide Protein Data Bank

Common D&A Project January 2010 Update

Sequence Editor ToolArchitecture for Current and Future Deployment

SequenceData Store

CurrentDP Pipeline

WFE/WFM

SequenceEditor Tool

Annotated Sequence Data

Future WorkflowDP Pipeline

PDB/FASTAPDBx/PreBlast

PDB/PDBx

WFE/WFM

Sequence Editor

Page 6: wwPDB Common D&A Project  January 28, 2010

Worldwide Protein Data Bank

Common D&A Project January 2010 Update

Accomplishments Annotator graphical interface for Sequence Editing

– Prototype evaluation and prioritization of additional requirements by Annotators at all sites completed Jan 12

– Expanded functionality development expected to be completed and available for user testing Feb. 15, including:

Implements the capability to incrementally undo a process step (UNDO) Summarization of sequence conflicts Global editing features

Integration of this Sequence Editor tool (interface) into the existing data processing pipelines (Feb 26)– Input accepts existing sequence data files at PDBe and RCSB (e.g. PDBx

+ Blast report or PDB + FASTA)

– Output integration via intermediate file to be integrated via Maxit

Page 7: wwPDB Common D&A Project  January 28, 2010

Worldwide Protein Data Bank

Common D&A Project January 2010 Update

Accomplishments

Master Format implementation (for current data model)– PDB to Master Format translation working with MAXIT

Final Test at PDBe– Validation and testing at all sites.– PDBj creation of new tool for Master Format Validation with

extended diagnostics.– Issues with Master Format will be ongoing - with evolution of the

PDB format, Hybrid methods etc.

Page 8: wwPDB Common D&A Project  January 28, 2010

Worldwide Protein Data Bank

Common D&A Project January 2010 Update

Sequence Editor Tool DevelopmentLessons Learned

Iterative development and active Annotator involvement is essential – and takes time.

Addressing integration issues with existing systems in terms of modularity, process ordering and data availability poses significant challenges.

Agile process of development and planning supports adaptation to evolving requirements.

We will need to further consider the most efficient level of granularity for the deployment of new functionality in existing systems in future planning.

Page 9: wwPDB Common D&A Project  January 28, 2010

Worldwide Protein Data Bank

Common D&A Project January 2010 Update

Design Convergence AccomplishmentsMaster Format, API, WFM, WFE, UI

Page 10: wwPDB Common D&A Project  January 28, 2010

Worldwide Protein Data Bank

Common D&A Project January 2010 Update

Accomplishments: WF infrastructure -Integration of Sequence Processing

Tracking and Status DB developed and installed at RCSB and PDBe for development purposes.

Work Flow Manager (WFM)– Prototype user testing on-going– Requirements refined and prototype updated– Infrastructure complete – to be deployed for testing this week

Work Flow Manager User Interface (WFM UI) – User prototype created, input received and prototype enhanced– Initial Level 1 annotator interface signed off by annotators– Level 2/3/4 interfaces prototyped and under review– Level 3 /4 under further development

Page 11: wwPDB Common D&A Project  January 28, 2010

Worldwide Protein Data Bank

Common D&A Project January 2010 Update

PDBe resource

Workflow XML– Luana/Tom : 1 day total to complete annotator requirements

WFE component supporting Sequence Processing : – Tom, 1-2 days per week ongoing, estimating 5-6 days (3 actual

weeks) to complete after all api’s are in place WFM

– Luana : currently full time – work is being prioritised to define the subset of requirements to be delivered in March.

Web resources : interfaces and WFM– External services –technology requirements have been defined.

Timeline tbd. Critical Path. Other resources

– Wim : python expertise– Swanand : python expertise (after 13th Feb) – fall-back

Page 12: wwPDB Common D&A Project  January 28, 2010

Worldwide Protein Data Bank

Common D&A Project January 2010 Update

RCSB Resources Web Tools -

– Currently supporting development and alpha-testing sites – Will add production site for Feb deployment

Database Support – – MySQL database server for status and tracking database

Application Support– Project SVN code repository– JIRA issue tracking system – Project documentation and information site (Drupal)– Automated build system for API and application tools

People –– Vladimir – API and build system (Python/C++)– Li – DB system and status and tracking API (Python/SQL)– Rahip – Sequence Editor Tool (Javascript/CSS)– Zukang/Raul/John – DP applications (C++/Python)

Page 13: wwPDB Common D&A Project  January 28, 2010

Worldwide Protein Data Bank

Common D&A Project January 2010 Update

Updated Timeline Summary

Sequence Processing

1. Sequence Editor Tool– Completion of Interface with prioritized additional requirements

and beginning of final user testing - projected Feb 15– Integration with current pipelines using Master Format In test

by annotators by Feb 25– In production – best estimate early March

2. Integration of Sequence processing components with new architecture (WFE/API and WFM) – User testing – April

3. Integration of module into Pipeline – Plan by end of March

Page 14: wwPDB Common D&A Project  January 28, 2010

Worldwide Protein Data Bank

Common D&A Project January 2010 Update

Competing/Complementary Priorities

Address On-going data quality issues and remediation Three Validation task forces

– Implementation of recommendations

New PDB Format – with the next 6 months? De-programming Kim

– For Ligand Processing: timeline end of March – early April

Other strategic considerations Stakeholders

– Stress testing of new solutions against expectations and existing solutions must be managed and will take some time.

Page 15: wwPDB Common D&A Project  January 28, 2010

Worldwide Protein Data Bank

Common D&A Project January 2010 Update

Next Phase - TimelineLigand Processing Requirements

– Plans in place for Annotator exchange– March requirements consolidation, initial design plan– March create overview plan and initial timeline

Kick off development Deployment

– Strategy to be defined based on current and ongoing lessons learned.

Page 16: wwPDB Common D&A Project  January 28, 2010

Worldwide Protein Data Bank

Common D&A Project January 2010 Update

Things that have kept us up at night

These are cornerstone deliverables requiring intense study and design consideration – beyond the proof of concept.– Organization of data, communication protocols, etc. – Clear consensus of design features has required an evolution of

understanding – requiring wetting of hands

Ramp up of skill sets: Python, mmCIF (PDBe), EBI External services: web-service set up Site specific integration challenges Resource issues

Page 17: wwPDB Common D&A Project  January 28, 2010

Worldwide Protein Data Bank

Common D&A Project January 2010 Update

BACK UP SLIDES

Page 18: wwPDB Common D&A Project  January 28, 2010

Worldwide Protein Data Bank

Common D&A Project January 2010 Update

Data and Application API Design

Unified Python language implementation Provides all access to data and applications for the

workflow manager and workflow engine Subcomponents of the API provide access to:

– Data objects and data values – Applications and tools – Tracking and status information– Site level configuration information

Page 19: wwPDB Common D&A Project  January 28, 2010

Worldwide Protein Data Bank

Common D&A Project January 2010 Update

Deliverable update: WFM Design Functional Architectural design

Will present progress and tracking information Will start/stop and restart the workflow engine in executing data

processing tasks Will work in a fully distributed web-based mode Will provide a launch point for tasks requiring interactive or

graphical interactions. Two modes defined – • Immediate mode – all processing occurs in a single session

(simple case).• Deferred mode – requests for input are registered with the

workflow manager for later processing by annotator

Page 20: wwPDB Common D&A Project  January 28, 2010

Worldwide Protein Data Bank

Common D&A Project January 2010 Update

Process Overview

With GO BACK functionality