using standards in real life - biomedbridges...2014/11/03 · biomedbridges/elixir resource...
TRANSCRIPT
Using Standards in Real Life
Helen Parkinson (EMBL-EBI) & Morris Swertz (UMCG)
BioMedBridges AGM Florence, 10-12 March 2014
On behalf of WP3 partners and collaborators
What, how and who? o What? Addition of scientific value between the ESFRI-
BMS Research Infrastructure domains o Who? For users, and developers of the infrastructures,
represented by use case pilots within the project and more widely
o How? Catalog, review, modification, registration, development and implementation of identifier, content, format and semantic standards supporting: ¡ Data Exchange ¡ Data Integration ¡ Infrastructure development, delivering new tools and
supporting data analysis
Connectivity
Themes: Identifiers ¡ Standardize identifier usage and drive technical
implementation with identifier resources
Themes: Identifiers ¡ Standardize identifier usage and drive technical
implementation with identifier resources
Themes: Identifiers ¡ Standardize identifier usage and drive technical
implementation with identifier resources
IC50, mode of action, target, adverse events, clinical trials, etc
Themes:Standards ¡ Standardize identifier usage ¡ Support the use of standards and promote
interoperability of standards via a registry and the mappings between them
Themes:Standards ¡ Standardize identifier usage ¡ Support the use of standards and promote
interoperability of standards via a registry and the mappings between them
Access to tools, semantic interoperability
¡ Standardize identifier usage ¡ Support the use of standards, promote
interoperability ¡ Provide access to tools ¡ Semantic interoperability (ontologies)
IS-A TYPE 1 DIABETES MELLITUS
MODY SYNDROME
RARE INSULIN DEPENDENT DIABETES MELLITUS
IS-A
METABOLIC DISEASE
Access to tools, semantic interoperability
¡ Standardize identifier usage ¡ Support the use of standards, promote
interoperability ¡ Provide access to tools ¡ Semantic interoperability (ontologies)
IS-A TYPE 1 DIABETES MELLITUS
MODY SYNDROME
RARE INSULIN DEPENDENT DIABETES MELLITUS
IS-A
METABOLIC DISEASE
Define:standards?
Define:standards?
Respondents classified into three categories across all domains Majority serving data All Using data Some both serving and using data
Define:standards?
NO
YES
KNOWLEDGE
DATA
KNOWLEDGE
IDENTIFIERS
DATA
KNOWLEDGE
IDENTIFIERS
FORMATS
DATA
*1033
KNOWLEDGE
IDENTIFIERS
FORMATS
ONTOLOGIES
DATA
*1033
DATA
KNOWLEDGE
IDENTIFIERS
FORMATS
ONTOLOGIES
DATA
KNOWLEDGE
IDENTIFIERS
FORMATS
ONTOLOGIES
DATA
KNOWLEDGE
IDENTIFIERS
FORMATS
ONTOLOGIES
DATA
KNOWLEDGE
IDENTIFIERS
FORMATS
ONTOLOGIES
What is this identifier for? How/where do I convert it? How do I convert my format to get this tool to work I want to merge two datasets and co-analyse them, what tools can I use? What’s the best analysis tool for my problem ? I need a web tool, I can’t install stuff on my desktop
How do I ….?
How do I ….?
Gene Ontology Enrichment Analysis - to establish if some subset of e.g. genes from a microarray analysis are enriched in terms of some biological function coded using the gene ontology e.g immune response
How do I ….?
Gene Ontology Enrichment Analysis - to establish if some subset of e.g. genes from a microarray analysis are enriched in terms of some biological function coded using the gene ontology e.g immune response
How do I ….?
Gene Ontology Enrichment Analysis - to establish if some subset of e.g. genes from a microarray analysis are enriched in terms of some biological function coded using the gene ontology e.g immune response
BioMedBridges/ELIXIR Resource Registry
¡ Provides a simple search interface ¡ Content: 1943 tools etc., 22,232 annotation
¡ E.g. URL, text, ontology term: type, formats ..
¡ Classifies tools using an ontology ¡ E.g. Sequence analysis tool
¡ Download complete content ¡ Supports a wide scope of tools ¡ Provides an interface to the literature ¡ Simple spreadsheet population ¡ Domain neutral
http://bioregistry.cbs.dtu.dk/
… and the user?
....when a tool updated its GO data is quite important…. …..I used GOrilla in the end as my requirements were pretty simple; a straight enrichment study and the graphics and interface were clean and easy to understand…
… and the user?
....when a tool updated its GO data is quite important…. …..I used GOrilla in the end as my requirements were pretty simple; a straight enrichment study and the graphics and interface were clean and easy to understand…
Clever naming doesn’t help users search for things!
Registry Future ¡ Building on 7 user engagement workshops >125
requests for new interface features ¡ Sustainability via ELIXIR Danish node collaboration ¡ Federated content sharing between registries and
projects esp. cross domain e.g. with EuroBioImaging, BioCatalog etc
¡ Benchmarking e.g. comparison of GO tools ¡ Projects adopting the code for local use ¡ Addressing interoperability -> automation and an
interoperable toolkit
http://bioregistry.cbs.dtu.dk/
Data pooling
Discover and integrate representative populations data sets (cohorts) to validate/recalibrate disease prediction models
DATA
KNOWLEDGE
DISCOVER TOOLS & DATA
HARMONISE DATA
DERIVE/ACQUIRE DATA
ANALYSE
Discover and integrate populations data sets (cohorts) to validate/recalibrate disease prediction models
Aim1. Find a cohort that we can test the model on, and that is representative of our local population -> results will be relevant to clinical practice Aim 2. Find a cohort that has common meta data with the model we wish to test, or has the meta data that can be converted
Harmonise Data
Harmonise Data
Harmonisation Process ¡ Identify the data elements from published study
to apply to our cohort
Harmonisation Process ¡ Identify the data elements from published study
to apply to our cohort ¡ Match the data element “Parental diabetes” –
annotate the elements with ontologies, query expansion, with string matching to assist the user
Harmonisation Process ¡ Identify the data elements from published study
to apply to our cohort ¡ Match the data element “Parental diabetes” –
annotate the elements with ontologies, query expansion, with string matching to assist the user
Harmonisation Process ¡ Identify the data elements from published study
to apply to our cohort ¡ Match the data element “Parental diabetes” –
annotate the elements with ontologies, query expansion, with string matching to assist the user
Harmonisation Process ¡ Identify the data elements from published study
to apply to our cohort ¡ Match the data element - stemming, ontologies, ¡ E.g. Parental diabetes vs. Diabetes mother/
father ¡ Convert the values e.g. for Units
Harmonisation Process ¡ Identify the data elements from published study
to apply to our cohort ¡ Match the data element - stemming, ontologies, ¡ E.g. Parental diabetes vs. Diabetes mother/
father ¡ Convert the values e.g. for Units
Derive/Acquire Data
Perform Reproducible Analysis
Data Pooling Conclusions
o Three models assessed for prediction in Netherlands population o Models applicable in both populations after calibration o Easier to do using tools than by hand o New tools to support this and other pooling scenarios – other domains
o Standards registry o Samples integration across infrastructures in BioSamples o Harmonization tools
o BioBankConnect – new libraries for sample conversion o Zooma - data/ontology mapping tools o Access to algorithms via tools registry
60 POSTERS & DEMOs USER ENGAGEMENT
Acknowledgements ¡ All BMB project partners, personnel and their BMS
Infrastructure colleagues ¡ Registry
¡ Kristoffer Rapacki, Emil Rydza, Piotr Chmura (ELIXIR DK Node ) ¡ Chris Mungall (Gene Ontology) and Anita Bandrowski
(NeuroInformatics Framework) ¡ BioCatalogue, Carole Goble, Niall, Beard,Aleksandra Nenadic
(ELIXIR UK Node) ¡ eTRIKs, TrAIT, Biosharing.org, TransMart, IMPC, RD-
Connect, DIACHRON, BioShare (Chao Pang) ¡ The BioMedBridges project is funded by the European
Commission within Research Infrastructures of the FP7 Capacities Specific Programme, grant agreement number 284209
¡ EMBL Core Funds, Parkinson, Birney Teams