memops data modelling and automatic code generation

49
Memops Data modelling and automatic code generation Edinburgh 9 September 2008

Upload: solada

Post on 23-Jan-2016

77 views

Category:

Documents


0 download

DESCRIPTION

Edinburgh 9 September 2008. Memops Data modelling and automatic code generation. Memops - main points. Code generation framework Data access subroutine libraries Fully automatic code generation from model Several programming languages in parallel Precise, detailed, validated data. Memops. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Memops Data modelling and  automatic code generation

Memops

Data modelling and automatic code generation

Edinburgh 9 September 2008

Page 2: Memops Data modelling and  automatic code generation

Memops - main points

■ Code generation frameworkCode generation framework

■ Data access subroutine librariesData access subroutine libraries

■ Fully automatic code generation from modelFully automatic code generation from model

■ Several programming languages in parallelSeveral programming languages in parallel

■ Precise, detailed, validated dataPrecise, detailed, validated data

Page 3: Memops Data modelling and  automatic code generation

Memops

● IntroductionIntroduction● Code generationCode generation● Generated librariesGenerated libraries● Applications of MemopsApplications of Memops

Page 4: Memops Data modelling and  automatic code generation

The CCPN Project

■ CCollaborative ollaborative CComputing omputing PProject for roject for NNMRMR

■ Since 1999Since 1999

■ Unifying platform for NMR software Unifying platform for NMR software similar to CCP4 for X-ray crystallographysimilar to CCP4 for X-ray crystallography

■ Community-based, open-source, software Community-based, open-source, software developmentdevelopment

■ Code generation, data model, applications, meetingsCode generation, data model, applications, meetings

Page 5: Memops Data modelling and  automatic code generation

NMR Structural Biology Pipeline

SamplePreparation

NMRMachine

StructureCalculation

DataProcessing

SpectrumAnalysis

RepositoryDatabase

Slow, complex,interactive

Page 6: Memops Data modelling and  automatic code generation

Native Anarchy

Convert

Task1

Task2

ConvertT

ask2

Tas

k1

Task1

Convert

Task3

Conve

rt

Task3

Convert

Task3

Page 7: Memops Data modelling and  automatic code generation

With Data Standard

DataStandard

Convert

Task1

Convert

Task2

Task2

Tas

k1

Conve

rt

Task1

Convert

Task3

Conve

rt

Task3

Convert

Task3

Page 8: Memops Data modelling and  automatic code generation

Data standard - objectives

● Lossless data transfer between programsLossless data transfer between programs- different approaches and architectures- different approaches and architectures

● All data needed for pipeline softwareAll data needed for pipeline software■ Creating data, not analysing end resultsCreating data, not analysing end results■ Intermediate results neededIntermediate results needed■ Comprehensive, detailed, complexComprehensive, detailed, complex

● Completeness, integrity of changing dataCompleteness, integrity of changing data

● Precisely defined standardPrecisely defined standard■ A single central descriptionA single central description■ Validation directly against standardValidation directly against standard

Page 9: Memops Data modelling and  automatic code generation

■ Standard API, no stable formatStandard API, no stable format● easier to maintain as model changeseasier to maintain as model changes

■ Abstract data model Abstract data model ● Exact correspondence to APIsExact correspondence to APIs

■ API implementations for several languagesAPI implementations for several languages

■ Transparent access to XML Transparent access to XML oror DB storage DB storage

■ Complete validation of model rules and Complete validation of model rules and constraintsconstraints

CCPN approach

Page 10: Memops Data modelling and  automatic code generation

Memops

● IntroductionIntroduction● Code generationCode generation● Generated librariesGenerated libraries● Applications of MemopsApplications of Memops

Page 11: Memops Data modelling and  automatic code generation

■ Model will change over timeModel will change over time● Several parallel implementationsSeveral parallel implementations● Synchronisation between APIs and modelSynchronisation between APIs and model● Maintenance and debuggingMaintenance and debugging● Resources are limitedResources are limited

■ Automatic Code GenerationAutomatic Code Generation● Write and debug once and for allWrite and debug once and for all● Any domain, from Astrophysics to ZoologyAny domain, from Astrophysics to Zoology● Quick and simple to extend modelQuick and simple to extend model

■ E.g. Application-specific packagesE.g. Application-specific packages

Automatic Code generation

Page 12: Memops Data modelling and  automatic code generation

Code Generation Framework

DomainExperts

MEMOPSframework

SoftwareDevelopers

User

Docum

entationA

pplicationD

eposition

APIs

Python

Java

C

Storage

SQL

XML

Handcoded (< 1%)

UML Model

Package 1

Package 2

Package 3

Autogeneration

Wrappers

Page 13: Memops Data modelling and  automatic code generation

Code Generation

ObjectDomain

UML data

edit UML

MetaModelIn-Memory Model

Python objects

On-disk modelXML file

API codeSchemasMappingsetc.

Autogeneration

CCPN codeOff-the-shelffiles

CCPN generated

Legend:

Export

Page 14: Memops Data modelling and  automatic code generation

API generator

ModelTraverseTextWriter

ApiGenPyLanguage

PyFileApiGen

FileApiGenPyApiGenPyType

• Written in Python• Modular• Different generators share code

Page 15: Memops Data modelling and  automatic code generation

Memops

● IntroductionIntroduction● Code generationCode generation● Generated librariesGenerated libraries● Applications of MemopsApplications of Memops

Page 16: Memops Data modelling and  automatic code generation

Model features

■ PackagesPackages to subdivide model, code, and data files to subdivide model, code, and data files

■ ObjectsObjects. Unique context, compare-by-identity. Unique context, compare-by-identity

■ Complex data typesComplex data types. Different contexts, . Different contexts, compare-by-valuecompare-by-value

■ Simple data typesSimple data types, , PositiveInt, enumerations, …PositiveInt, enumerations, …

■ Attributes and linksAttributes and links::● Cardinality, frozen/modifiable, derivedCardinality, frozen/modifiable, derived● Unique/ordered collections (sets, lists, unique lists)Unique/ordered collections (sets, lists, unique lists)

■ Ad-hocAd-hoc constraintsconstraints on attributes, simple and on attributes, simple and complex datatypes, and objects.complex datatypes, and objects.

Page 17: Memops Data modelling and  automatic code generation

Molstructure model package

*

** *

*

1

StructureEnsemble

+ensembleId: Int+atomNamingSystem: Line+resNamingSystem: Line

+getEnsembleValidations()

Chain

+code: Line

+getChain()

Model

+serial: Int+name: Line+details: Text

Coord

+altLocationCode: Line = +x: Float+y: Float+z: Float

+bFactor: Float = 0.0+occupancy: Float = 1.0

Residue

+seqId: Int+seqCode: Int

+seqInsertCode: Line =

+getResidue()

Atom

+name: Word+elementSymbol: Word

+getAtom()+getElementSymbol()+getChemAtom()

ccp.molecule.MolSystem.Chain

ccp.molecule.MolSystem.Residue

ccp.molecule.MolSystem.Atom

ccp.molecule.ChemComp.ChemAtom

+coordChains

1*

1

1

1

1

*

1

*

11

1

*1

1

*

11

ccp.molecule.MolSystem.MolSystem

+code: Word+name: Text+keywords: Line...:

1

Page 18: Memops Data modelling and  automatic code generation

CCPN APIs

■ AApplication pplication PProgramming rogramming IInterfacenterface● Object orientedObject oriented● Data accessed in memory as if stored in the data Data accessed in memory as if stored in the data

modelmodel

■ Implementations come with:Implementations come with:● Integrated, transparent I/O (file or database)Integrated, transparent I/O (file or database)● Complete validity checkingComplete validity checking● Protection against casual change (data Protection against casual change (data

encapsulation) encapsulation) ● Versioning and backwards compatibilityVersioning and backwards compatibility● Event notifier systemEvent notifier system● Slot for application-specific dataSlot for application-specific data

Page 19: Memops Data modelling and  automatic code generation

Science code

User Interface

Utility functions

Python+XML at runtime

Python API

XML I/O codeXML I/O mappings

Data StorageXML files

User application

Data get, set. Validity check

Generic XML read/write

User data in CCPN XMLformat

What to do for which element

CCPN codeOff-the-shelfApplication codefiles

CCPN generated

Legend:

XML parser

Page 20: Memops Data modelling and  automatic code generation

Java+DB at runtime

CCPN code Off-the-shelfApplication code files

CCPN generated

Legend:

HQL

Science code

User Interface

Utility functions

Java API

HibernateHibernate mappings

Database

Presentation layer

Database Schema

Hibernate

Optional

Custom queries(Hibernate Query

Language)

Page 21: Memops Data modelling and  automatic code generation

Now Available

■ Version 2.0 just releasedVersion 2.0 just released

■ Python+XML, Java+XML, C+XML Python+XML, Java+XML, C+XML Java+DB (with Hibernate)Java+DB (with Hibernate)

■ Available under GPL licenseAvailable under GPL licensefrom Sourceforge or www.ccpn.ac.ukfrom Sourceforge or www.ccpn.ac.uk

■ CCPN Data Standard:CCPN Data Standard:● NMR, Macromolecules, LIMSNMR, Macromolecules, LIMS● 46 packages46 packages● 552 classes and data types552 classes and data types● Python+XML implementation Python+XML implementation

800,000+ lines of code800,000+ lines of code

Page 22: Memops Data modelling and  automatic code generation

Memops

● IntroductionIntroduction● Code generationCode generation● Generated librariesGenerated libraries● Applications of MemopsApplications of Memops

Page 23: Memops Data modelling and  automatic code generation

CcpNmr Suite

■ AnalysisAnalysis ● Interactive NMR analysisInteractive NMR analysis

■ FormatConverterFormatConverter● Convert between 30+ NMR and structure formatsConvert between 30+ NMR and structure formats

■ Built on top of CCPN model (Python+XML)Built on top of CCPN model (Python+XML)

■ Version 2.0 releasedVersion 2.0 released

■ Widely used in macromlecular NMRWidely used in macromlecular NMR

Page 24: Memops Data modelling and  automatic code generation

CcpNmr Analysis

Page 25: Memops Data modelling and  automatic code generation

ExtendNMR NMR pipeline

■ Integrated macromolecular NMR pipelineIntegrated macromolecular NMR pipeline- from sample to structure- from sample to structure

■ Pre-existing programs from 8 groupsPre-existing programs from 8 groups

■ In-memory conversion to internal data In-memory conversion to internal data structuresstructures

■ Integrated versions released:Integrated versions released:● ARIA (NMR structure generation)ARIA (NMR structure generation)● Bruker TOPSPIN, Manufacturers Bruker TOPSPIN, Manufacturers

processing/analysis packageprocessing/analysis package

Page 26: Memops Data modelling and  automatic code generation

BIOXDM

■ Software pipeline for on-synchrotron Software pipeline for on-synchrotron crystallographycrystallography● Exploit new technology (Exploit new technology ( goniometers) goniometers)● Experiment optimisation, acquisition, and on-line Experiment optimisation, acquisition, and on-line

processingprocessing

■ Independent data model, with Memops Independent data model, with Memops machinerymachinery

■ Java+DB implementation for runtime Java+DB implementation for runtime concurrent accessconcurrent access

Page 27: Memops Data modelling and  automatic code generation

EUROCarbDB

■ Distributed deposition database Distributed deposition database ● Glycobiology and glycomics Glycobiology and glycomics ● NMR, MS, HPLCNMR, MS, HPLC and topology and topology

■ Java. Database storage using HibernateJava. Database storage using Hibernate

■ CCPN model Java+DB implementation CCPN model Java+DB implementation slot in as-isslot in as-is

Page 28: Memops Data modelling and  automatic code generation

Funding acknowledgementsFunding acknowledgements

■ BBSRC CCPN grants

■ European Union grants● EXTEND-NMR, EU-NMR, NMR-Life, NMRQUAL, and

TEMBLOR contracts

■ Industry support● AstraZeneca, Dupont Pharma (now BMS), Genentech,

GlaxoSmithKline

● Peter Keller (BIOXDM) thanks Synchrotron ‘Soleil’, the Global Phasing Consortium and EU FP6 ‘BIOXHIT’

Page 29: Memops Data modelling and  automatic code generation

People

■ Authors: Authors: Prof. Ernest Laue, Wayne Boucher, Rasmus Fogh, Tim Prof. Ernest Laue, Wayne Boucher, Rasmus Fogh, Tim Stevens, John Ionides, Wim Vranken (EBI), Peter Keller Stevens, John Ionides, Wim Vranken (EBI), Peter Keller (Global Phasing)(Global Phasing)

■ Collaborators at U. Cambridge: Collaborators at U. Cambridge: Dan O’Donovan, Wolfgang Rieping, Alan da Silva, Darima Dan O’Donovan, Wolfgang Rieping, Alan da Silva, Darima LamazhapovaLamazhapova

■ Collaborators at EBI (MSD), Hinxton: Collaborators at EBI (MSD), Hinxton: Kim Henrick, Anne Pajon, Chris PenkettKim Henrick, Anne Pajon, Chris Penkett

■ Special thanks to: Special thanks to: Bruker Biospin GmbH (TOPSPIN), Michael Nilges (ARIA), Bruker Biospin GmbH (TOPSPIN), Michael Nilges (ARIA), Bas Leeflang (EUROCarbDB; FP6 contract RIDS-CT-2004-Bas Leeflang (EUROCarbDB; FP6 contract RIDS-CT-2004-0119501195

Page 30: Memops Data modelling and  automatic code generation

ENDEND

Page 31: Memops Data modelling and  automatic code generation

Overview

● PackagesPackages● The Implementation packageThe Implementation package

■ ObjectsObjects■ DataTypes and DataObjTypesDataTypes and DataObjTypes

● Access controlAccess control

Page 32: Memops Data modelling and  automatic code generation

ARIA – structure generation from NMR dataARIA – structure generation from NMR data

Custom conversionARIA Data Model

CCPNData Model

CCPNXML

Application

ARIAXML

■ ARIA importsARIA imports● Peak ListsPeak Lists● ConstraintsConstraints● SequencesSequences● Chemical shiftsChemical shifts

■ ARIA exportsARIA exports● Peak AssignmentsPeak Assignments● Filtered ConstraintsFiltered Constraints● ViolationsViolations● StructuresStructures

Page 33: Memops Data modelling and  automatic code generation

API functions

■ ‘‘get’ and ‘set’ get’ and ‘set’ (Attributes and links)(Attributes and links)

■ ‘‘add’ and ‘remove’ add’ and ‘remove’ (Collection attributes and links)(Collection attributes and links)

■ ‘‘sortedsorted’ (Unordered collection links)’ (Unordered collection links)■ ‘‘findFirst’ and ‘findAll’ findFirst’ and ‘findAll’ (Collection links)(Collection links)

● Simple filtering (attribute == value)Simple filtering (attribute == value)

■ create and ‘new’ create and ‘new’ (Objects)(Objects)● Normal and ‘factory function’ object creationNormal and ‘factory function’ object creation

■ delete delete (Objects)(Objects)● ‘‘Delete’ function – cascades to objects rendered invalid by deletionDelete’ function – cascades to objects rendered invalid by deletion

■ checkValid, checkAllValid checkValid, checkAllValid (Objects)(Objects)

■ API classes are strongly coupled. API classes are strongly coupled. For efficiency reasons object-to-object links are two-way.For efficiency reasons object-to-object links are two-way.

Page 34: Memops Data modelling and  automatic code generation

FormatConverter - The NMR Translator

CCPNData Model

Peaks Chemical shifts Acquisition parameters

XEasy NmrView XEasy NmrView Bruker Varian... ...

Generic peak converter

Generic chemical shift converter

Generic acquisition parameters converter

Processing parameters

XEasy XEasy NmrView NMRPipeAzara... ...NmrView

Fo

rmat

sp

ecif

ic r

ead

ers

Dat

a m

od

e l e

ntr

yF

orm

at s

pec

ific

wri

ters

Chemical shiftsPeaks

Page 35: Memops Data modelling and  automatic code generation

ExtendNMR: ARIA

■ Structure generation from macromolecular Structure generation from macromolecular NMR data, ambiguous distance constraintsNMR data, ambiguous distance constraints

■ One of two leading programsOne of two leading programs

■ Python and scripts, with CNS dynamics Python and scripts, with CNS dynamics engineengine

■ All input and output integrated to CCPN All input and output integrated to CCPN standardstandard

Page 36: Memops Data modelling and  automatic code generation

ARIA: CCPN object selection

Page 37: Memops Data modelling and  automatic code generation

ExtendNMR: Bruker TOPSPIN

■ NMR processing program of major NMR NMR processing program of major NMR instrument company instrument company

■ Java. In-memory conversion to CCPN Java. In-memory conversion to CCPN Java+XML implementationJava+XML implementation

■ CCPN output in current TOPSPIN release,CCPN output in current TOPSPIN release,Expanded in upcoming release.Expanded in upcoming release.

Page 38: Memops Data modelling and  automatic code generation

Data Model v. Data Format

Atom_ID elementName Bond_ID Atom_ID Bond_ID bondOrder

Relational Database :

Abstract model (UML) :

XML :<Atom ID=“AT1” elementName=“C”> <Bond ID=“BD1” bondOrder=“1.0”> <BondList> <Atom1 IDREF=“AT1”/> <Bond IDREF=“BD1”/> <Atom2 IDREF=“AT2/> . </Bond> . </BondList></Atom>

Atom BondAtom_Bond_Connect

Atom+elementName: String = C

Bond+bondOrder: Float = 1.0*

2 +bonds

+atoms

Page 39: Memops Data modelling and  automatic code generation

Packages

ChemElementChemComp

Molecule

MolStructure

MolSystem

memops.AccessControl

memops.Implementation

Page 40: Memops Data modelling and  automatic code generation

Packages

■ Partition model, code, and dataPartition model, code, and data■ Import each otherImport each other■ Can be omittedCan be omitted■ All import Implementation and All import Implementation and

AccessControlAccessControl

■ Each have a TopObjectEach have a TopObject■ No links between data from rival Topbjects No links between data from rival Topbjects

(different e(different extentsxtents of data) of data)

Page 41: Memops Data modelling and  automatic code generation

Root and TopObjects

ccp.molecule.Molecule.Molecule

ccp.molecule.Molecule.MolResidue

1

*

ccp.molecule.ChemComp.ChemComp

1

ccp.molecule.ChemComp.ChemAtom

ccp.molecule.ChemComp.AbstractChemAtom

+chemAtoms

1

*

ccp.molecule.ChemComp.ChemBond

+chemAtoms

*2

*

1

memops.Implementation.MemopsRoot

+name: Word = ccpProject+override: Boolean = False+currentUserId: Word = user

+newGuid()+getPackageLocator()

1

*

1

*+currentMolecule+currentChemComp

memops.Implementation.TopObject

+guid: Line

+getPackageLocator()

*

1

Page 42: Memops Data modelling and  automatic code generation

TopObjects

■ One in every packageOne in every package● Ultimate parent to all objects in packageUltimate parent to all objects in package

■ Have globally unique identifier (‘guid’)Have globally unique identifier (‘guid’)■ currentXyz links from rootcurrentXyz links from root■ Links can constrain links between descendantsLinks can constrain links between descendants

■ In file implementations:In file implementations:● Hold links to storage and backup locationsHold links to storage and backup locations● Live in Implementation as almost empty shellLive in Implementation as almost empty shell

Page 43: Memops Data modelling and  automatic code generation

Overview

● PackagesPackages● The Implementation packageThe Implementation package

■ ObjectsObjects■ DataTypes and DataObjTypesDataTypes and DataObjTypes

● Access controlAccess control

Page 44: Memops Data modelling and  automatic code generation

CcpNmr AnalysisCcpNmr Analysis

■ NMR Assignment ProgramNMR Assignment Program● Inspired by ANSIG and SparkyInspired by ANSIG and Sparky

● Demonstrates CCPN approachDemonstrates CCPN approach

● Modern interface and scriptingModern interface and scripting

● Scalable and extensibleScalable and extensible

■ Operating SystemsOperating Systems● Linux, Sun, SGI, OSX, WindowsLinux, Sun, SGI, OSX, Windows

■ LanguagesLanguages● PythonPython

■ Data model interactionData model interaction

■ Tk Graphical interfaceTk Graphical interface

■ ScriptingScripting

● CC■ OpenGL/Tk contoursOpenGL/Tk contours

■ Structure displayStructure display

■ Mathematical operationsMathematical operations

Page 45: Memops Data modelling and  automatic code generation

Implementation Package

■ Model and Code:Model and Code:● Supertypes that define all objectsSupertypes that define all objects

■ Objects Objects ■ DataTypes DataTypes ■ DataObjTypsDataObjTyps

● Basic data typesBasic data types

■ Data – how to access the real data:Data – how to access the real data:● Data location pointersData location pointers● Current-package pointersCurrent-package pointers● Implementation data are Implementation data are notnot part of the data set, and part of the data set, and

are are notnot in the database. in the database.● Represent view or session?Represent view or session?

Page 46: Memops Data modelling and  automatic code generation

Data Location

FileStorageObject

+isLoaded: Boolean+isModified: Boolean+isReading: Boolean+isModifiable: Boolean = True+createdBy: Word+lastUnlockedBy: Word

+setIsModifiable()+touch()+saveTo(repository)+removeFrom(repository)+save()+backup()

MemopsRoot

+name: Word = ccpProject+override: Boolean = False+currentUserId: Word = user

+newGuid()+getPackageLocator()

Repository

+name: Line+format: StorageFormat = xml+url: Url

+getFileLocation(packageName)

TopObject

+guid: Line

+getPackageLocator()

PackageLocator

+targetName: Word = any

+repositories

1

*

{ordered}

+activeRepositories

*

*

1

+backedUp +backup

*

{ordered}

+stored +repositories

* 1..*

1

*1 1

Page 47: Memops Data modelling and  automatic code generation

Objects and their Supertypes

DataObject

+applicationData: ApplicationData

DbMemopsRoot

DbTopObject

FileMemopsRoot

+saveModified()+saveAll()+refreshTopObjects(packageName)+backupAll()

+importData(filePath)

FileStorageObject

+isLoaded: Boolean

+isModified: Boolean+isReading: Boolean+isModifiable: Boolean = True+createdBy: Word+lastUnlockedBy: Word

+setIsModifiable()+touch()

+saveTo(repository)+removeFrom(repository)+save()+backup()

FileTopObject

+loadFrom(repository)+load()

+restore()

ImplementationObject

MemopsObject

+isDeleted: Boolean

+getExpandedKey()

MemopsRoot

+name: Word = ccpProject+override: Boolean = False+currentUserId: Word = user

+newGuid()+getPackageLocator()

TopObject

+guid: Line

+getPackageLocator()

ComplexDataType

«DataType»

+className: Word+packageName: Word+packageShortName: Word

+qualifiedName: Line+inConstructor: Boolean

+getQualifiedName()

ccp.molecule.Molecule.Molecule

ccp.molecule.Molecule.MolResidue

+topObject1

+root1

1

*

1

1*

+currentMolecule

1

*

Page 48: Memops Data modelling and  automatic code generation

Simple Data Types

Boolean DataType

Int DataType

Float DataType

String DataType

Line DataType

Text DataType

Long DataType

Double DataType

Word DataType

PositiveInt DataType

SingleLine DataType

NonNegativeInt DataType

Dict DataType

DateTime DataType

StringKeyDict DataType

Any DataType

Token DataType

NonNegativeFloat DataType

FloatRatio DataType

PositiveFloat DataType

SpacelessString DataType

LongWord DataType

PositiveDouble DataType

NonNegativeDouble DataType

UrlProtocol DataType

Page 49: Memops Data modelling and  automatic code generation

Complex Data Types

ComplexDataType«DataType»

+className: Word+packageName: Word+packageShortName: Word+qualifiedName: Line+inConstructor: Boolean

+getQualifiedName()

MemopsDataTypeObject«DataType»

+override: Boolean

+endOverride()

Url«DataType»

+protocol: UrlProtocol = file+user: Line+password: Line+host: Line+path: PathString+port: Int+dataLocation: PathString

+getDataLocation()

AppDataBoolean«DataType»

+value: Boolean

AppDataDouble«DataType»

+value: Double

AppDataFloat«DataType»

+value: Float

AppDataInt«DataType»

+value: Int

AppDataLong«DataType»

+value: Long

AppDataString«DataType»

+value: String

ApplicationData«DataType»

+application: Line+keyword: Line