"standards landscape" nif big data 2 knowledge (bd2k) initiative, sep, 2013

19
Data Consultant, Honorary Academic Editor Susanna-Assunta Sansone, PhD Associate Director, Principal Investigator NIH BD2K Workshop: Frameworks for Community-Based Standards Efforts, Sept 25-26, 2013 Mapping the Landscape of Community Standards Challenges and Opportunities www.slideshare.net/SusannaSansone

Upload: susanna-assunta-sansone

Post on 27-Jan-2015

110 views

Category:

Technology


0 download

DESCRIPTION

Overview of the landscape of standards in life sciences for the NIH BD2K "Frameworks for Community-Based Standards Efforts" workshop September 25, 2013 - September 26, 2013 Co-Chairs: Susanna Sansone, PhD and David Kennedy PhD. The overall goal of this workshop is to learn what has worked and what has not worked in community-based standards efforts. Participants will have experience in leading specific community based standards initiatives. Prior to the workshop, participants will be asked to address in writing answers to specific questions regarding formulating, conducting, and maintaining such efforts. This information will be used to facilitate focused and actionable discussion at the workshop. Issuance of a Request for Information soliciting comment from the broader community on some of the key issues addressed in the workshop is currently envisioned. Contact: [email protected] Agenda: Frameworks for Community-Based Standards Efforts (PDF 40.7KB) Participant List: Roster of Invited Participants (PDF 32KB) Forum (Join the discussion): http://frameworks.prophpbb.com Watch Live: http://videocast.nih.gov/summary.asp?live=13088 - See more at: http://bd2k.nih.gov/workshops.html#cbse

TRANSCRIPT

Page 1: "Standards landscape" NIF Big Data 2 Knowledge (BD2K) Initiative, Sep, 2013

Data Consultant, Honorary Academic Editor

Susanna-Assunta Sansone, PhD Associate Director, Principal Investigator

NIH BD2K Workshop: Frameworks for Community-Based Standards Efforts, Sept 25-26, 2013

Mapping the Landscape of Community Standards

Challenges and Opportunities

www.slideshare.net/SusannaSansone

Page 2: "Standards landscape" NIF Big Data 2 Knowledge (BD2K) Initiative, Sep, 2013

§  Researchers and bioinformaticians in both

academic and commercial arenas, along with

funding agencies and publishers, embrace

the concept that community-developed,

standards are pivotal to structure, enrich the description and share

•  entities of interest

e.g., genes, metabolites,

phenotypes, models

•  experimental steps

e.g., provenance of study materials,

technology and measurement types

Growing movement for reproducible research

Page 3: "Standards landscape" NIF Big Data 2 Knowledge (BD2K) Initiative, Sep, 2013

A community mobilization to develop standards, e.g.:

§  Structural and operational differences •  organization types (open, close to members, society, WG etc.) •  standards development (how to formulate, conduct and maintain) •  adoption, uptake, outreach (link to journals, funders and commercial sector) •  funds (sponsors, memberships, grants, volunteering)

de jure de facto

grass-roots groups

standard organizations

Nanotechnology Working Group

Page 4: "Standards landscape" NIF Big Data 2 Knowledge (BD2K) Initiative, Sep, 2013

Types of reporting standards

Nanotechnology Working Group

Including minimum information reporting requirements, or checklists to report the same core, essential information

Including controlled vocabularies, taxonomies, thesauri, ontologies etc. to use the same word and refer to the same ‘thing’

Including conceptual model, conceptual schema from which an exchange format is derived to allow data to flow from one system to another

Page 5: "Standards landscape" NIF Big Data 2 Knowledge (BD2K) Initiative, Sep, 2013

Technologically-delineated views of the world

Biologically-delineated views of the world

Generic features (‘common core’) - description of source biomaterial - experimental design components

Arrays

Scanning Arrays & Scanning

Columns

Gels MS MS

FTIR

NMR

Columns

transcriptomics proteomics metabolomics

plant biology epidemiology microbiology

Fragmentation, duplications and gaps

To compare and integrate data we need interoperable standards

Page 6: "Standards landscape" NIF Big Data 2 Knowledge (BD2K) Initiative, Sep, 2013

Growing number of reporting standards

+ 130 + 150

+ 303

Source: BioPortal

Databases, annotation,

curation tools

implementing standards

miame!MIAPA!

MIRIAM!MIQAS!MIX!

MIGEN!

CIMR!MIAPE!

MIASE!

MIQE!

MISFISHIE….!

REMARK!

CONSORT!

MAGE-Tab!GCDML!

SRAxml!SOFT! FASTA!

DICOM!

MzML !SBRML!

SEDML…!

GELML!

ISA-Tab!

CML!

MITAB!

AAO!CHEBI!

OBI!

PATO! ENVO!MOD!

BTO!IDO…!

TEDDY!

PRO!XAO!

DO

VO!Source: B

ioSharing

Source: BioSharing

Page 7: "Standards landscape" NIF Big Data 2 Knowledge (BD2K) Initiative, Sep, 2013

But how much do we know about these standards

Page 8: "Standards landscape" NIF Big Data 2 Knowledge (BD2K) Initiative, Sep, 2013
Page 9: "Standards landscape" NIF Big Data 2 Knowledge (BD2K) Initiative, Sep, 2013

•  A coherent, curated and searchable registry of standards for describing and reporting experiments in life science, environmental, biomedical and biotechnological domains

Page 10: "Standards landscape" NIF Big Data 2 Knowledge (BD2K) Initiative, Sep, 2013

•  A coherent, curated and searchable registry of standards for describing and reporting experiments in life science, environmental, biomedical and biotechnological domains

•  Progressively associate standards to data policies and databases •  Develop assessment criteria for usability and popularity of standards •  Help stakeholders to make informed decisions on e.g. what standards or

databases to use or recommend; identify efforts they have funded

Page 11: "Standards landscape" NIF Big Data 2 Knowledge (BD2K) Initiative, Sep, 2013

The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project

11

Page 12: "Standards landscape" NIF Big Data 2 Knowledge (BD2K) Initiative, Sep, 2013

The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project

12

Users can claim records and maintain them

Page 13: "Standards landscape" NIF Big Data 2 Knowledge (BD2K) Initiative, Sep, 2013

The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project

13

Criteria to be used in evaluating standards for adoption

Page 14: "Standards landscape" NIF Big Data 2 Knowledge (BD2K) Initiative, Sep, 2013

The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project

14

Help prospective users to select and use appropriate one

Page 15: "Standards landscape" NIF Big Data 2 Knowledge (BD2K) Initiative, Sep, 2013

The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project

15

Classify, links standards and visualize relations

Page 16: "Standards landscape" NIF Big Data 2 Knowledge (BD2K) Initiative, Sep, 2013

Example

The relationship among popular standard formats for pathway information BioPAX and PSI-MI are designed for data exchange to and from databases and pathway and network data integration. SBML and CellML are designed to support mathematical simulations of biological systems and SBGN represents pathway diagrams.

CREDIT: Demir, et al., The BioPAX community standard for pathway data sharing, 2010.

Page 17: "Standards landscape" NIF Big Data 2 Knowledge (BD2K) Initiative, Sep, 2013

REVIEWS Drug Discovery Today ! Volume 16, Numbers 21/22 !November 2011

Empowering industrial research withshared biomedical vocabulariesLee Harland1,10, Christopher Larminie2, Susanna-Assunta Sansone3, Sorana Popa4,M. Scott Marshall5, Michael Braxenthaler6, Michael Cantor7, Wendy Filsell8,Mark J. Forster9, Enoch Huang10, Andreas Matern11, Mark Musen12, Jasmin Saric13,Ted Slater14, Jabe Wilson15, Nick Lynch16, John Wise17 and Ian Dix18

1Connected Discovery Ltd., 27 Old Gloucester Street, London WC1N 3AX, UK2GlaxoSmithKline, Computational Biology, 2F157 Gunnels Wood Road, Stevenage, Hertfordshire SG1 2NY, UK3 Standards and Data Sharing Infrastructure Team, e-Research Centre, University of Oxford, 7 Keble Rd, Oxford OX1 3QG, UK4Knowledge Management and Information Science, R&D Information, AstraZeneca R&D Molndal, 431 83 Molndal, Sweden5Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Einthovenweg 20, 2333 ZC Leiden, The Netherlands6 Pharma Research and Early Development, Hoffmann-LaRoche Inc., 340 Kingsland St, Nutley, NJ 07110, USA7 Pfizer Worldwide Research and Development, 235 E 42nd ST, MS 150/5/60N, New York, NY 10017, USA8Unilever R&D, Colworth Science Park, Sharnbrook, Bedfordshire MK44 1LQ, UK9 Syngenta R&D Information Systems, International Research Centre, Jealott’s Hill, Berkshire RG42 6EX, UK10 Pfizer Worldwide Research and Development, 35 Cambridge Park Drive, Cambridge, MA 02140, USA11 Thomson Reuters Life Sciences, 22 Thomson Place, Boston, MA 02210, USA12 Stanford University, Stanford University, 251 Campus Drive, Stanford, CA 94305-5479, USA13 Scientific Information Centre, Boehringer Ingelheim Pharma GmbH & Co. KG, 88397 Biberach, Germany14Merck Sharp & Dohme Corp., 33 Avenue Louis Pasteur, Boston, MA 02115-5727, USA15 Science & Technology, Corporate Markets, Elsevier Pharma and Biotech Group, Elsevier, 32 Jamestown Road, London NW1 7BY, UK16AstraZeneca UK, Alderley Park, Macclesfield SK10 4TG, UK17 The Pistoia Alliance118 Knowledge Management & Information Science, R&D Information, AstraZeneca, 26F17 Mereside, Alderley Park, Macclesfield SK10 4TG, UK

The life science industries (including pharmaceuticals, agrochemicals and consumer goods) areexploring new business models for research and development that focus on external partnerships. Inparallel, there is a desire to make better use of data obtained from sources such as human clinical samplesto inform and support early research programmes. Success in both areas depends upon the successfulintegration of heterogeneous data from multiple providers and scientific domains, something that isalready a major challenge within the industry. This issue is exacerbated by the absence of agreedstandards that unambiguously identify the entities, processes and observations within experimentalresults. In this article we highlight the risks to future productivity that are associated with incompletebiological and chemical vocabularies and suggest a new model to address this long-standing issue.

IntroductionCommercial life science organizations are evolving; they are

exploring new mechanisms to adjust to well-documented

economic and productivity challenges. At the same time, thanks

to the rapid technological advances within biology they are facing

an explosion in the volume and complexity of available data.

Efficient management, processing and application of internal

and external data are vital to research and development produc-

tivity [1,2]. Yet, an integrated view across experiments, literature

Review

s!IN

FORMATICS

Corresponding author: Harland, L. ([email protected])1 http://pistoiaalliance.org

940 www.drugdiscoverytoday.com 1359-6446/06/$ - see front matter ! 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.drudis.2011.09.013

The information landscape in the industrial sector

Big Life Science

Company

Yesterday Today Tomorrow

Big Life Science

Company

Proprietary content provider

Public content provider

Academic group

Software vendor

CRO

Service provider

Regulatory authorities

…evolving…

Credit: Pistoia Alliance Michael Braxenthaler, Roche

Page 18: "Standards landscape" NIF Big Data 2 Knowledge (BD2K) Initiative, Sep, 2013

Not just technological but also social challenges

§  Ownership of open standards can be problematic in broad, grass-root collaborations •  legal framework is still embryonic

•  it requires improved models, to encourage maintenance of and

contributions to these efforts, supporting their evolutions

•  Extensive community liaison needs to be •  managed and funded

•  rewards and incentives need to be identified for all contributors

Page 19: "Standards landscape" NIF Big Data 2 Knowledge (BD2K) Initiative, Sep, 2013

Acknowledgements •  Jessica Tenenbaum •  Michael Braxenthaler •  Lee Harland •  Bryn Williams-Jones •  Ian Dix •  Trish Whetzel •  Mark Musen •  Collaborators in

•  OBO Foundry •  COSMOS •  ISA Commons (especially ISA-Tab-Nano team) •  GSC •  Metabolomics Society •  Data Dryad •  Pistoia Alliance •  Elixir UK •  and many more….