"standards landscape" nif big data 2 knowledge (bd2k) initiative, sep, 2013
DESCRIPTION
Overview of the landscape of standards in life sciences for the NIH BD2K "Frameworks for Community-Based Standards Efforts" workshop September 25, 2013 - September 26, 2013 Co-Chairs: Susanna Sansone, PhD and David Kennedy PhD. The overall goal of this workshop is to learn what has worked and what has not worked in community-based standards efforts. Participants will have experience in leading specific community based standards initiatives. Prior to the workshop, participants will be asked to address in writing answers to specific questions regarding formulating, conducting, and maintaining such efforts. This information will be used to facilitate focused and actionable discussion at the workshop. Issuance of a Request for Information soliciting comment from the broader community on some of the key issues addressed in the workshop is currently envisioned. Contact: [email protected] Agenda: Frameworks for Community-Based Standards Efforts (PDF 40.7KB) Participant List: Roster of Invited Participants (PDF 32KB) Forum (Join the discussion): http://frameworks.prophpbb.com Watch Live: http://videocast.nih.gov/summary.asp?live=13088 - See more at: http://bd2k.nih.gov/workshops.html#cbseTRANSCRIPT
Data Consultant, Honorary Academic Editor
Susanna-Assunta Sansone, PhD Associate Director, Principal Investigator
NIH BD2K Workshop: Frameworks for Community-Based Standards Efforts, Sept 25-26, 2013
Mapping the Landscape of Community Standards
Challenges and Opportunities
www.slideshare.net/SusannaSansone
§ Researchers and bioinformaticians in both
academic and commercial arenas, along with
funding agencies and publishers, embrace
the concept that community-developed,
standards are pivotal to structure, enrich the description and share
• entities of interest
e.g., genes, metabolites,
phenotypes, models
• experimental steps
e.g., provenance of study materials,
technology and measurement types
Growing movement for reproducible research
A community mobilization to develop standards, e.g.:
§ Structural and operational differences • organization types (open, close to members, society, WG etc.) • standards development (how to formulate, conduct and maintain) • adoption, uptake, outreach (link to journals, funders and commercial sector) • funds (sponsors, memberships, grants, volunteering)
de jure de facto
grass-roots groups
standard organizations
Nanotechnology Working Group
Types of reporting standards
Nanotechnology Working Group
Including minimum information reporting requirements, or checklists to report the same core, essential information
Including controlled vocabularies, taxonomies, thesauri, ontologies etc. to use the same word and refer to the same ‘thing’
Including conceptual model, conceptual schema from which an exchange format is derived to allow data to flow from one system to another
Technologically-delineated views of the world
Biologically-delineated views of the world
Generic features (‘common core’) - description of source biomaterial - experimental design components
Arrays
Scanning Arrays & Scanning
Columns
Gels MS MS
FTIR
NMR
Columns
transcriptomics proteomics metabolomics
plant biology epidemiology microbiology
Fragmentation, duplications and gaps
To compare and integrate data we need interoperable standards
Growing number of reporting standards
+ 130 + 150
+ 303
Source: BioPortal
Databases, annotation,
curation tools
implementing standards
miame!MIAPA!
MIRIAM!MIQAS!MIX!
MIGEN!
CIMR!MIAPE!
MIASE!
MIQE!
MISFISHIE….!
REMARK!
CONSORT!
MAGE-Tab!GCDML!
SRAxml!SOFT! FASTA!
DICOM!
MzML !SBRML!
SEDML…!
GELML!
ISA-Tab!
CML!
MITAB!
AAO!CHEBI!
OBI!
PATO! ENVO!MOD!
BTO!IDO…!
TEDDY!
PRO!XAO!
DO
VO!Source: B
ioSharing
Source: BioSharing
But how much do we know about these standards
• A coherent, curated and searchable registry of standards for describing and reporting experiments in life science, environmental, biomedical and biotechnological domains
• A coherent, curated and searchable registry of standards for describing and reporting experiments in life science, environmental, biomedical and biotechnological domains
• Progressively associate standards to data policies and databases • Develop assessment criteria for usability and popularity of standards • Help stakeholders to make informed decisions on e.g. what standards or
databases to use or recommend; identify efforts they have funded
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
11
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
12
Users can claim records and maintain them
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
13
Criteria to be used in evaluating standards for adoption
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
14
Help prospective users to select and use appropriate one
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
15
Classify, links standards and visualize relations
Example
The relationship among popular standard formats for pathway information BioPAX and PSI-MI are designed for data exchange to and from databases and pathway and network data integration. SBML and CellML are designed to support mathematical simulations of biological systems and SBGN represents pathway diagrams.
CREDIT: Demir, et al., The BioPAX community standard for pathway data sharing, 2010.
REVIEWS Drug Discovery Today ! Volume 16, Numbers 21/22 !November 2011
Empowering industrial research withshared biomedical vocabulariesLee Harland1,10, Christopher Larminie2, Susanna-Assunta Sansone3, Sorana Popa4,M. Scott Marshall5, Michael Braxenthaler6, Michael Cantor7, Wendy Filsell8,Mark J. Forster9, Enoch Huang10, Andreas Matern11, Mark Musen12, Jasmin Saric13,Ted Slater14, Jabe Wilson15, Nick Lynch16, John Wise17 and Ian Dix18
1Connected Discovery Ltd., 27 Old Gloucester Street, London WC1N 3AX, UK2GlaxoSmithKline, Computational Biology, 2F157 Gunnels Wood Road, Stevenage, Hertfordshire SG1 2NY, UK3 Standards and Data Sharing Infrastructure Team, e-Research Centre, University of Oxford, 7 Keble Rd, Oxford OX1 3QG, UK4Knowledge Management and Information Science, R&D Information, AstraZeneca R&D Molndal, 431 83 Molndal, Sweden5Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Einthovenweg 20, 2333 ZC Leiden, The Netherlands6 Pharma Research and Early Development, Hoffmann-LaRoche Inc., 340 Kingsland St, Nutley, NJ 07110, USA7 Pfizer Worldwide Research and Development, 235 E 42nd ST, MS 150/5/60N, New York, NY 10017, USA8Unilever R&D, Colworth Science Park, Sharnbrook, Bedfordshire MK44 1LQ, UK9 Syngenta R&D Information Systems, International Research Centre, Jealott’s Hill, Berkshire RG42 6EX, UK10 Pfizer Worldwide Research and Development, 35 Cambridge Park Drive, Cambridge, MA 02140, USA11 Thomson Reuters Life Sciences, 22 Thomson Place, Boston, MA 02210, USA12 Stanford University, Stanford University, 251 Campus Drive, Stanford, CA 94305-5479, USA13 Scientific Information Centre, Boehringer Ingelheim Pharma GmbH & Co. KG, 88397 Biberach, Germany14Merck Sharp & Dohme Corp., 33 Avenue Louis Pasteur, Boston, MA 02115-5727, USA15 Science & Technology, Corporate Markets, Elsevier Pharma and Biotech Group, Elsevier, 32 Jamestown Road, London NW1 7BY, UK16AstraZeneca UK, Alderley Park, Macclesfield SK10 4TG, UK17 The Pistoia Alliance118 Knowledge Management & Information Science, R&D Information, AstraZeneca, 26F17 Mereside, Alderley Park, Macclesfield SK10 4TG, UK
The life science industries (including pharmaceuticals, agrochemicals and consumer goods) areexploring new business models for research and development that focus on external partnerships. Inparallel, there is a desire to make better use of data obtained from sources such as human clinical samplesto inform and support early research programmes. Success in both areas depends upon the successfulintegration of heterogeneous data from multiple providers and scientific domains, something that isalready a major challenge within the industry. This issue is exacerbated by the absence of agreedstandards that unambiguously identify the entities, processes and observations within experimentalresults. In this article we highlight the risks to future productivity that are associated with incompletebiological and chemical vocabularies and suggest a new model to address this long-standing issue.
IntroductionCommercial life science organizations are evolving; they are
exploring new mechanisms to adjust to well-documented
economic and productivity challenges. At the same time, thanks
to the rapid technological advances within biology they are facing
an explosion in the volume and complexity of available data.
Efficient management, processing and application of internal
and external data are vital to research and development produc-
tivity [1,2]. Yet, an integrated view across experiments, literature
Review
s!IN
FORMATICS
Corresponding author: Harland, L. ([email protected])1 http://pistoiaalliance.org
940 www.drugdiscoverytoday.com 1359-6446/06/$ - see front matter ! 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.drudis.2011.09.013
The information landscape in the industrial sector
Big Life Science
Company
Yesterday Today Tomorrow
Big Life Science
Company
Proprietary content provider
Public content provider
Academic group
Software vendor
CRO
Service provider
Regulatory authorities
…evolving…
Credit: Pistoia Alliance Michael Braxenthaler, Roche
Not just technological but also social challenges
§ Ownership of open standards can be problematic in broad, grass-root collaborations • legal framework is still embryonic
• it requires improved models, to encourage maintenance of and
contributions to these efforts, supporting their evolutions
• Extensive community liaison needs to be • managed and funded
• rewards and incentives need to be identified for all contributors
Acknowledgements • Jessica Tenenbaum • Michael Braxenthaler • Lee Harland • Bryn Williams-Jones • Ian Dix • Trish Whetzel • Mark Musen • Collaborators in
• OBO Foundry • COSMOS • ISA Commons (especially ISA-Tab-Nano team) • GSC • Metabolomics Society • Data Dryad • Pistoia Alliance • Elixir UK • and many more….