companies collaborate on to manage data efficiently · af0013142 bulkand tapped density astm...

34
©2015 Allotrope Foundation Pharmaceutical companies collaborate on building a Framework to manage Analytical Data more efficiently Dr. Gerhard Noelken Allotrope BoD member, Pfizer Allotrope Liaison GINAS Symposium, Uppsala 7 September 2015

Upload: others

Post on 06-Apr-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: companies collaborate on to manage Data efficiently · AF0013142 Bulkand Tapped Density ASTM Sieve‐SN/09783 AF0045674 NMR Characterization DRX600‐SN/10234567 AF0034558 IRFingerprinting

©2015 Allotrope Foundation 

Pharmaceutical companies collaborate on building a Framework to manage

Analytical Data more efficiently

Dr. Gerhard NoelkenAllotrope BoD member, Pfizer Allotrope Liaison

GINAS Symposium, Uppsala7 September 2015

Page 2: companies collaborate on to manage Data efficiently · AF0013142 Bulkand Tapped Density ASTM Sieve‐SN/09783 AF0045674 NMR Characterization DRX600‐SN/10234567 AF0034558 IRFingerprinting

©2015 Allotrope Foundation  22

• MOTIVATION• THEORY• REDUCING IT TO PRACTICE

Page 3: companies collaborate on to manage Data efficiently · AF0013142 Bulkand Tapped Density ASTM Sieve‐SN/09783 AF0045674 NMR Characterization DRX600‐SN/10234567 AF0034558 IRFingerprinting

©2015 Allotrope Foundation 

What is the problem we are trying to solve?

3

DATA SOURCE DATA VALUE FROM DATA

MUSIC

Electronic Lab Notebook

Chromatography Data 

System

Data Archive

DATA

Page 4: companies collaborate on to manage Data efficiently · AF0013142 Bulkand Tapped Density ASTM Sieve‐SN/09783 AF0045674 NMR Characterization DRX600‐SN/10234567 AF0034558 IRFingerprinting

©2015 Allotrope Foundation 

Why is access to music so much easier than access to scientific data?

4

Think about scientific data...

MaterialEquipment

ProcessResult

.dat .tbl

.HDF

.csv

.DAML.LCD

.XML.jdx

.irf

.pdid

.drdd .asc.cdf.frx

.raw

Scientific data is typically stored in a wide variety of non‐standard, proprietary formats…

…with contextual metadata that are hard to find and sometimes inconsistent

...making it costly and sometimes difficult to find and get value from it.

Think about music...

ArtistAlbumSong

GenreDateArtwork

Music is typically stored in a small number of  standard, non‐proprietaryformats…

…with contextualmetadata that arecomplete, consistent & correct 

…enabling the user to find, share and enjoyit years later from any device easily!

Page 5: companies collaborate on to manage Data efficiently · AF0013142 Bulkand Tapped Density ASTM Sieve‐SN/09783 AF0045674 NMR Characterization DRX600‐SN/10234567 AF0034558 IRFingerprinting

©2015 Allotrope Foundation 

What if scientific data was as easy to access as music?

•Store scientific and process data in a standard format with contextual metadatathat is...

•correct

•complete

•consistent

•compliant

•Find data in seconds.

•Be confident that the data that underpins our decisions is accurate, complete, and compliant.

•Build data quality and data integrity into the system, eliminating the need for many SOPs and quality investigations.

•Simplify, automate and improve laboratory and manufacturing processes.

•Automatically create technical reports, audit trails, and substantial portions of regulatory submission documents.

•Answer complex questions, not just those accessible via simple queries ‐ by linking data from diverse, disparate sources.

If we... We could...

5

Page 6: companies collaborate on to manage Data efficiently · AF0013142 Bulkand Tapped Density ASTM Sieve‐SN/09783 AF0045674 NMR Characterization DRX600‐SN/10234567 AF0034558 IRFingerprinting

©2015 Allotrope Foundation 

Allotrope Foundation

6

• Subject Matter Experts• Project Funding

Member Companies

• Project Management• Legal & Logistical Support

Secretariat

• Framework Development• Technical Leadership

ProfessionalSoftware Firm

• Requirements & Specifications• Contributions, PoC Applications

Partner Network

AbbVieAmgenBaxterBayer

BiogenBoehringer IngelheimBristol‐Myers SquibbEli Lilly

Genentech/RocheGlaxoSmithKlineMerck & Co.Pfizer

ACD/LabsAgilent BioviaBSSNIDBSMestrelab Research

Mettler ToledoPersistentRiffynSartoriusShimadzuThermo Scientific

Waters

Erasmus Univ. Med CenterUniversity of Southampton

Page 7: companies collaborate on to manage Data efficiently · AF0013142 Bulkand Tapped Density ASTM Sieve‐SN/09783 AF0045674 NMR Characterization DRX600‐SN/10234567 AF0034558 IRFingerprinting

©2015 Allotrope Foundation  7

• MOTIVATION• THEORY• REDUCING IT TO PRACTICE

Page 8: companies collaborate on to manage Data efficiently · AF0013142 Bulkand Tapped Density ASTM Sieve‐SN/09783 AF0045674 NMR Characterization DRX600‐SN/10234567 AF0034558 IRFingerprinting

©2015 Allotrope Foundation 

What is Allotrope Creating?

8

Allotrope Foundation FrameworkAllotrope Foundation FrameworkReusable Software

Components

Standard File Format

Open Metadata

Repository

Application 1 Application 2

Metadata repository

New Instrument

Instrument

.etc.etc

.etc

.etc

.etc

.etc.etc

.etc

.etc

.etc

.dat

.tbl.HDF

.raw .csv

.DAML

.LCD

.XML

.mzML

.jdx

.irf .pdid.drdd

.asc

.cdf

.frx

.adf

File format for any technique or instrument

Project Test InstrumentAF 0012354 IR Fingerprinting QC Lab #33B 380 FT‐IRAE0012764 Bulk & Tapped Density ASTM Standard Seive #6AF  12989 NMR Characterization AM500 

Tapped & Bulk Density Sieve XXXAF0045674 Caractérisation  RMN Nouvelle DRX600AF‐0034558 IR iS10 FT‐IR 

Project Test InstrumentAF0012354 IR Fingerprinting 380 FTIR/‐SN/145453AF0012764 Bulk and Tapped Density ASTM Sieve‐SN/3452AF0012989 NMR Characterization AM500‐SN/0034578 AF0013142 Bulk and Tapped Density ASTM Sieve‐SN/09783AF0045674 NMR Characterization DRX600‐SN/10234567AF0034558 IR Fingerprinting iS10 FTIR/‐SN/341980

With the Metadata Repository

Standard vocabulary & structure for metadata 

A toolkit that enables use of the standards & metadata in software 

development

Page 9: companies collaborate on to manage Data efficiently · AF0013142 Bulkand Tapped Density ASTM Sieve‐SN/09783 AF0045674 NMR Characterization DRX600‐SN/10234567 AF0034558 IRFingerprinting

©2015 Allotrope Foundation 

ADF: a universal file format for scientific data

Data DescriptionResource Description Framework

(RDF) Model

Data DescriptionResource Description Framework

(RDF) Model

Data Cubes Universal data container

Data Cubes Universal data container

Data Package  Virtual file system *Data Package  Virtual file system *

Contains semantic descriptions of:• Method, instrument, sample, process, result, etc.

• Data cube metadata• Binary file metadata

Analytical data represented by one‐ or multidimensional arrays

9

HDF5Platform Independent File Format

Allotrope Data Format

* Use is optional

Analytical data represented by arbitrary formats, incl. native instrument formats, images, pdf, video, etc.

Page 10: companies collaborate on to manage Data efficiently · AF0013142 Bulkand Tapped Density ASTM Sieve‐SN/09783 AF0045674 NMR Characterization DRX600‐SN/10234567 AF0034558 IRFingerprinting

©2015 Allotrope Foundation 

ADF Class Libraries + Decoupled Taxonomies

Platform independent file format(HDF 5)

Data Package API Data Cube API

Data Description API (Jena, dotNetRDF)

Analytical Data API

Taxono

mies +

10

Triple Store API

April2015

April2015

Oct 2014

Oct 2014

Jan 2015

June2015

Languages:  Java, C#

Page 11: companies collaborate on to manage Data efficiently · AF0013142 Bulkand Tapped Density ASTM Sieve‐SN/09783 AF0045674 NMR Characterization DRX600‐SN/10234567 AF0034558 IRFingerprinting

©2015 Allotrope Foundation 

Allotrope Taxonomies: An Extensible Metadata Model

• A library of extensible taxonomies– Uses W3C standards– Easy to understand and maintain 

by SMEs and Vendors• Start by harvesting existing 

available concepts– PSI‐MS; IUPAC; RSC Chemical 

Methods Ontology; Dictionary of weighing terms; AnIML, etc

• Reproducible & efficient collaboration model– Leverages knowledge engineers 

& member company scientists– 2‐3 weeks to develop initial 

version of a new taxonomy 

• Initial versions of 12 analytical techniques already implemented:– gas chromatography– Karl Fischer– liquid chromatography– mass spectrometry– nuclear magnetic resonance 

spectroscopy– thermogravimetric analysis– ultra violet spectrometry– cell counter– cell culture analyzer– blood gas analysis– balance– pH

11

Page 12: companies collaborate on to manage Data efficiently · AF0013142 Bulkand Tapped Density ASTM Sieve‐SN/09783 AF0045674 NMR Characterization DRX600‐SN/10234567 AF0034558 IRFingerprinting

©2015 Allotrope Foundation 

Allotrope Foundation Taxonomies

12

Page 13: companies collaborate on to manage Data efficiently · AF0013142 Bulkand Tapped Density ASTM Sieve‐SN/09783 AF0045674 NMR Characterization DRX600‐SN/10234567 AF0034558 IRFingerprinting

©2015 Allotrope Foundation 

The Big Picture

13

ADF Data Package

ADF Data Cube

Page 14: companies collaborate on to manage Data efficiently · AF0013142 Bulkand Tapped Density ASTM Sieve‐SN/09783 AF0045674 NMR Characterization DRX600‐SN/10234567 AF0034558 IRFingerprinting

©2015 Allotrope Foundation 

The Big Picture

14

ADF Data Package

ADF Data Cube

Page 15: companies collaborate on to manage Data efficiently · AF0013142 Bulkand Tapped Density ASTM Sieve‐SN/09783 AF0045674 NMR Characterization DRX600‐SN/10234567 AF0034558 IRFingerprinting

©2015 Allotrope Foundation 

The Big Picture

15

ADF Data Package

ADF Data Cube

Page 16: companies collaborate on to manage Data efficiently · AF0013142 Bulkand Tapped Density ASTM Sieve‐SN/09783 AF0045674 NMR Characterization DRX600‐SN/10234567 AF0034558 IRFingerprinting

©2015 Allotrope Foundation 

Result

Process

Equipment

16©2015 Allotrope Foundation 

Page 17: companies collaborate on to manage Data efficiently · AF0013142 Bulkand Tapped Density ASTM Sieve‐SN/09783 AF0045674 NMR Characterization DRX600‐SN/10234567 AF0034558 IRFingerprinting

©2015 Allotrope Foundation 

AF Taxonomies Documentation

17

Page 18: companies collaborate on to manage Data efficiently · AF0013142 Bulkand Tapped Density ASTM Sieve‐SN/09783 AF0045674 NMR Characterization DRX600‐SN/10234567 AF0034558 IRFingerprinting

©2015 Allotrope Foundation 

ADF Class Libraries + Decoupled Taxonomies

Platform independent file format(HDF 5)

Data Package API Data Cube API

Data Description API (Jena, dotNetRDF)

Analytical Data API

Taxono

mies +

18

Triple Store API

April2015

April2015

Oct 2014

Oct 2014

Jan 2015

June2015

Languages:  Java, C#

Page 19: companies collaborate on to manage Data efficiently · AF0013142 Bulkand Tapped Density ASTM Sieve‐SN/09783 AF0045674 NMR Characterization DRX600‐SN/10234567 AF0034558 IRFingerprinting

©2015 Allotrope Foundation 

RDF Data Model• Subject‐Predicate‐Object (Triple)

• Example:

<Sample 1> type <Sample><Sample 1> createdOn ‘2015‐03‐13’<Sample 1> createdBy <person X><Sample 1> hasBarcode ‘1234567890’

19

Subject Predicate Object

Page 20: companies collaborate on to manage Data efficiently · AF0013142 Bulkand Tapped Density ASTM Sieve‐SN/09783 AF0045674 NMR Characterization DRX600‐SN/10234567 AF0034558 IRFingerprinting

©2015 Allotrope Foundation 

Data Shapes Constrain How We Use Taxonomies in the Real World

– Taxonomies provide an unconstrained vocabulary that we can use to describe things (instances) in our open world and give them a meaning (= what it is)

– We need a mechanism to define data structures (schemas, templates) that describe how to use the taxonomies for a given purpose in a standardized       (= reproducible, predictable, verifiable) way

– Shapes Constraint Language (SHACL, expressed as RDF triples) is an emerging standard to do this

20

http://www.w3.org/2014/data‐shapes/charter

Page 21: companies collaborate on to manage Data efficiently · AF0013142 Bulkand Tapped Density ASTM Sieve‐SN/09783 AF0045674 NMR Characterization DRX600‐SN/10234567 AF0034558 IRFingerprinting

©2015 Allotrope Foundation 

Using Data Shapes: EquipmentShape hierarchies

defineadditional constraints

A system has at least 1 component

A hplc system has at least 1 component and has at least 1 column, exactly 1 autosamplerand at least 1 detector

A hplc‐uv system has at least 1 component and has at least 1 column, exactly 1 autosamplerand at least 1 detectorand at least 1 uv‐detector

21

Page 22: companies collaborate on to manage Data efficiently · AF0013142 Bulkand Tapped Density ASTM Sieve‐SN/09783 AF0045674 NMR Characterization DRX600‐SN/10234567 AF0034558 IRFingerprinting

©2015 Allotrope Foundation 

The ADF enables a self‐contained documentation of the data & metadata

Plan Analysis

Prepare Samples

Submit Samples

Control Inst. Acquire Data

Process Data

Analyze Data

Reports Results

Store, Archive Data

Request

ProcessStep

Legend

22

Page 23: companies collaborate on to manage Data efficiently · AF0013142 Bulkand Tapped Density ASTM Sieve‐SN/09783 AF0045674 NMR Characterization DRX600‐SN/10234567 AF0034558 IRFingerprinting

©2015 Allotrope Foundation 

The ADF enables a self‐contained documentation of the data & metadata

Plan Analysis

Prepare Samples

Submit Samples

Control Inst. Acquire Data

Process Data

Analyze Data

Reports Results

Store, Archive Data

Request

Data & Metadata

ProcessStep

Legend

Standard data file format for data & metadataOutput from one system becomes the input to the nextThe APIs enable the use of one vendor agnostic file format

Control Inst. Acquire Data

Process Data

Analyze Data

Interoperability(Plug & Play)

Report & Share

Search & Reuse Data

More automated reporting,Powerful searching

Analytical Method

Sample Prep Data

Instrument Instruction

Instrument Data

Processed Data

Analyzed Data

Reported Results Stored Data

23

Page 24: companies collaborate on to manage Data efficiently · AF0013142 Bulkand Tapped Density ASTM Sieve‐SN/09783 AF0045674 NMR Characterization DRX600‐SN/10234567 AF0034558 IRFingerprinting

©2015 Allotrope Foundation 

Allotrope Data FormatData DescriptionData Description

Data Cubes Data Cubes 

Data Package(s) (optional) Data Package(s) (optional) 

24

Request Sample prep Method Data acquisitionInstrument instruct.

Chromatogram 2D HDF Chromatogram 2D HDF

Chromatogram: 3D HDFChromatogram: 2D HDF

Page 25: companies collaborate on to manage Data efficiently · AF0013142 Bulkand Tapped Density ASTM Sieve‐SN/09783 AF0045674 NMR Characterization DRX600‐SN/10234567 AF0034558 IRFingerprinting

©2015 Allotrope Foundation  25

• MOTIVATION• THEORY• REDUCING IT TO PRACTICE

Page 26: companies collaborate on to manage Data efficiently · AF0013142 Bulkand Tapped Density ASTM Sieve‐SN/09783 AF0045674 NMR Characterization DRX600‐SN/10234567 AF0034558 IRFingerprinting

©2015 Allotrope Foundation 

Alpha & Beta Releaseavailable to AF & APN members now

• Taxonomies (work in progress)– Mature versions for: 

• MS, LC, pH, weighing, UV– Initial versions of:

• NMR, cell counter, blood gas, capillary electrophoresis, cell culture analyzer, thermogravimetricanalysis, Karl Fischer, GC, 

– OWL, OWL + SKOS, Excel and OWLDoc formats

• APIs– Data Package, Data Description, 

Data Cube, Analytical Data • ADF & API documentation• New Example Applications for 

the ADF APIs 

26

Page 27: companies collaborate on to manage Data efficiently · AF0013142 Bulkand Tapped Density ASTM Sieve‐SN/09783 AF0045674 NMR Characterization DRX600‐SN/10234567 AF0034558 IRFingerprinting

©2015 Allotrope Foundation 

Current Allotrope Integration Project Dashboard

27

Project Type Description CompanyStatus

Idea Business Case Scoping Specification Agile

Execution Implementation

Data Converter

•A temporary, expedient solution to transform data into ADF to mitigate obsolescence; enables first step to full adoption

A

B

C

Lab & Plant Automation

•Platform for the planning, execution, analysis & reporting of analytical chemistry leveraging the Framework

•Includes IoT instrument integration; metadata repository/method management; workflow execution; ADF I/O integration with COTS and in-house software

•Enables significant opportunities for automating data flow

D

E

F

G

H

I

Taxonomies

•Leverage Allotrope taxonomies to provide metadata for enriched index

•Lightweight universal viewer for any technique

H

J

Data Lake •Repository based on ADF/AT/APIs H

CRO Integration

•Convert at CRO to return raw & processed data back to company in ADF

K

Page 28: companies collaborate on to manage Data efficiently · AF0013142 Bulkand Tapped Density ASTM Sieve‐SN/09783 AF0045674 NMR Characterization DRX600‐SN/10234567 AF0034558 IRFingerprinting

©2015 Allotrope Foundation 

2013

2014

20152016

28

2012 Allotrope Foundation

• Initiated software development and evaluations

• Established feasibility through PoCs• ADF design & due diligence

• Framework Development• Integration at Members

• Framework used in production• First Public release

Page 29: companies collaborate on to manage Data efficiently · AF0013142 Bulkand Tapped Density ASTM Sieve‐SN/09783 AF0045674 NMR Characterization DRX600‐SN/10234567 AF0034558 IRFingerprinting

©2015 Allotrope Foundation 

Thank you!Networking with Peers: upcoming workshops and meetings • Sep 15, 2015 (Chicago, IL): Allotrope Partner Network F2F Workshop• Sep 16, 2015 (Chicago, IL): Cross Industry Workshop

29

To join or get additional information, contact:

James Vergis, Ph.D. Science Advisor | Drinker Biddle & Reath LLP1‐202‐230‐[email protected]@allotrope.org www.allotrope.org

To join or get additional information, contact:

James Vergis, Ph.D. Science Advisor | Drinker Biddle & Reath LLP1‐202‐230‐[email protected]@allotrope.org www.allotrope.org

Page 30: companies collaborate on to manage Data efficiently · AF0013142 Bulkand Tapped Density ASTM Sieve‐SN/09783 AF0045674 NMR Characterization DRX600‐SN/10234567 AF0034558 IRFingerprinting

©2015 Allotrope Foundation 

What brought me to Uppsala ??

• Meeting Thomas Balzer at an IDMP conference in Berlin

• Allotrope Framework presented as a long‐term option to establish a smooth dataflow– from analytical data creation in the lab– to analytical  Data reporting e.g. for Group 4 Specified  Substances 

• Allotrope closing a gap in terms of missing analytical data standards in the CMC area

• Discussion:– Can we identify synergies between Allotrope Framework

for analytical data management in the Lab andanalytical information in a substance database

30

Page 31: companies collaborate on to manage Data efficiently · AF0013142 Bulkand Tapped Density ASTM Sieve‐SN/09783 AF0045674 NMR Characterization DRX600‐SN/10234567 AF0034558 IRFingerprinting

©2015 Allotrope Foundation 

Specified Substance Group 4 Analytical Data

31

These are Data that are in our:LIMS (Laboratory Information Management System) andELN (electronic Lab Notebook) and CDS (Chromatographic Data systems) todayRaw Data from Instruments in many different mainly proprietary Data formatsProcessed Data (results) with non‐harmonized meta‐data

ISO/FDIS 11238, 2011

Page 32: companies collaborate on to manage Data efficiently · AF0013142 Bulkand Tapped Density ASTM Sieve‐SN/09783 AF0045674 NMR Characterization DRX600‐SN/10234567 AF0034558 IRFingerprinting

©2015 Allotrope Foundation 

Drug Development

HL7

HL7: eStability

eCTD

CDISC

Analytical CMC

CDER Data Standard

Allotrope Framework

Class Libraries

Metadata Repository

WorkflowAutomation

InformationAccess

Data Standards

Archiving

Service Standards

32

Allotrope Framework addresses the gap in standards for CMC analytical data

Data prepared for submission in 

standard formatData ReviewData EntryAnalytical Testing

Page 33: companies collaborate on to manage Data efficiently · AF0013142 Bulkand Tapped Density ASTM Sieve‐SN/09783 AF0045674 NMR Characterization DRX600‐SN/10234567 AF0034558 IRFingerprinting

©2015 Allotrope Foundation 

Jan Feb Mar Apr May Jun Jul Aug Sep Oct

September, 2015 Release: Version 1.0

Draft Data Description 

API

DraftAnalytical Data API

Release Version 1.0

2015

Draft ADF Format & APIData Description API

ADF Format & API Version 1.0

Draft Taxonomies Taxonomies Version 1.0Equipment Process

Material Result

Data Cube API

Data Package API

Data Description API Version 1.0

Data Cube API Version 1.0

Data Package API Version 1.0

Draft Analytical Data API Analytical Data API Version 1.0

Alpha release to 

APN

33

Analytical Techniques

Taxonomies Version 1.0

Page 34: companies collaborate on to manage Data efficiently · AF0013142 Bulkand Tapped Density ASTM Sieve‐SN/09783 AF0045674 NMR Characterization DRX600‐SN/10234567 AF0034558 IRFingerprinting

©2015 Allotrope Foundation  34